Download In Memory Database

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Functional Database Model wikipedia , lookup

Clusterpoint wikipedia , lookup

SAP IQ wikipedia , lookup

Database model wikipedia , lookup

Transcript
In Memory Database
Jordan Easley, Kelby Chase, Nirmal
Dharmaratne
Outline
-History
-Introduction
-Features
-Parallelism
-Concurrent Operations
-Reliability and Scalability
-Long term data
-Disadvantages
-Differences between Oracle TimesTen and
SAP HANA
Brief History
•
•
•
•
•
Development first started in 1988 by
White Cross Systems, Limited.
Relational IMDS started in 1993, released
in 1993 by Perihelion Software.
Times-ten acquired by Oracle in 2005.
SAP HANA announced in 2011 - the first
full in-memory database system.
Only recently RAM is cheap enough to be
viable
What is In-Memory
Database?
•
•
A database management system that stores
data entirely on the main memory.
Designed for use with big data and high level
analysis.
Disk Storage
•
Operates by magnetising a film of magnetic
material on a spinning disk.
To retrieve data, the head assembly moves to the
data location while the disk is spinning.
o Access time is limited by variations in physical
location of data.
o
RAM
•
Random Access Memory
o
o
Data stored using transistors and capacitors that
hold a charge.
Allows data to be accessed in any order, regardless
of location.
RAM Speed
•
•
IMDS has
o 4x faster read speeds
o 400x faster writing speeds
Memory Requirements
•
•
A well designed in-memory system can
require less than a 100K of memory.
This is much smaller than traditional
databases which require from 100s of
kilobytes up to many megabytes.
OLAP
•
Online Analytical Processing
o
o
Uses data cubes - databases containing every
possible combination of dimension, measure, and
hierarchy.
Stores pre-processed aggregate data so that it isn't
calculated for every query.
Features of some in memory
databases
-No Aggregate tables
-Column oriented structure
-Limited projections
-Insert only data management
-No caching
-On-the-fly extensibility
-Optimization
-Indexing
-Bulk load mode
Aggregate Data
•
•
•
•
In traditional database design aggregate
data is stored in separate tables to speed up
queries.
As data scales, more and more aggregate
data is added to speed up queries.
This adds complexity to the system.
•
•
SAP HANA has decreased code complexity
by eliminating aggregate data in favor of an
in-memory systems higher speed and
compute all aggregates need "on-the-fly".
This method provides the same or better
performance as traditional systems while
making system maintenance a lot easier
Column Oriented Structure
•
A database management system that stores
data tables as sections of columns of data
rather than rows of data.
Column Oriented Structure
•
•
•
Does not scan each row left-to-right
Retrieves data only from the specified
columns
Each entity has an index corresponding to its
row.
Column oriented serialization
•
•
row oriented databases serialize all values of
in a row together
1,Smith,Joe,40000;
2,Jones,Mary,50000;
3,Johnson,Cathy,44000;
column oriented databases serialize all
values of a column together.
1,2,3;
Smith,Jones,Johnson;
Joe,Mary,Cathy;
40000,50000,44000;
Column vs. Row based structure
•
Row based database
o
o
•
Traditionally used when a system needs all or most
of an entire row.
$SQL = "SELECT * from Customers WHERE Fname
= 'John' AND Lname = 'Smith'"
Column based database
o
o
o
Used for high level analysis and functions
$SQL = "SELECT AVG(Salary) FROM Employees"
Typically takes longer to enter new records
Limited Projections
•
•
Instead of grabbing all information it only
takes needed data.
Limited projections are part of what make
Column base design possible.
Insert Only
•
Data management that does not update or
delete records
Does not use processing time on deleting or
updating.
o Allows companies to archive data.
o
Caching
•
•
Traditional databases may cache the results
of frequently queried data in RAM
Caching is not existent in an in-memory
database
o
Again, it just adds complexity to the system
On-The-Fly Extensibility
•
Allows the adding of new columns to an
existing database without the rewriting of all
pages using the database.
Optimization
• In-memory systems are considered more
optimized than traditional systems
•
On-disk databases have to keep redundant
data for the sake of indexing and caching,
and use CPU power to maintain the cache.
Indexing Techniques
•Hash indexes
•T-tree indexes
Hash indexes
•They specify how many buckets are to be
allocated for the table's hash index.
T-tree indexes
• A typical disk-based RDBMS uses B+-tree
indexes to reduce the amount of disk I/O
needed to accomplish indexed look-up of
data files.
•
T-tree indexes are more economical than Btrees in that they require less memory space
and fewer CPU cycles
T-tree cont.
• T-tree consists of a group of T-tree nodes .
• Each node consists of a group of 64 index
•
•
•
keys, each of which points to a separate
table row.
Node holds an array of keys and three pointers: one to
parent node and two to right and left child respectively
Hold only the value of the left-most key in
each index node.
Other values can be accessed by accessing
the data rows themselves.
•
•
•
The T-tree search algorithm knows with just
two comparisons whether the value it is
searching for is on the current node or
elsewhere in memory.
With every new index node pointer
indirection, its search area is cut in half
They shrink the complexity of code and
reduce code path length by removing
unnecessary computations and taking
advantage of pointer traversal where
T*-tree
-shows the search path used to locate a range
of rows that contain the value ‘58’ in the
column designated as the index key.
T tree vs B tree
Bulk load mode
•
Bulk load mode allows the insert of large
amounts of data without the overhead of
each individual transaction.
Reduction of Layers
•In most in-memory databases, you can run
processing on the database layer in contrast
to traditional database where it is at least 3
tiered application( Web,
Presentation,Processing)
•This is because in in-memory
you don't have
to deal with the network transfer of data over
to your middle tiers, thus speeding up the
Parallelism
•
•
Parallelism is the use of multiple cores within
a application.
Allows multi-core processing of a single
query.
Concurrent operations
•
•
Concurrency is not that different from
traditional databases
Since in-memory databases are faster than
traditional ones, transactions complete
quicker making lock contention not as
important as previously.
Reliability of in-memory systems
•
•
While main memory is more volatile than
disk memory and thus less reliable, a well
designed in-memory system can roll back to
the last successful transaction on a system
crash
Even with this assurance most companies
still feel the need to regular backups on disk
Scalability of In-Memory Systems
•
•
•
In-memory systems have been proven to
scale far beyond the terabyte size range.
In a test an in-memory database grew to
1.17 TB and 15.54 billion rows.
In this test performance remained consistent
as the database grew showing near linear
scalability.
Long term data storage
•
In-memory systems are not considered
useful for storing unused or rarely used data
o
o
SAP calls this passive data
Passive data may be moved to secondary storage
during downtime
Disadvantages of In-Memory
systems
-Cost
-Higher energy use
-Requires more floor space
Why is there a need for a new
system
-Data half life- the amount of time in which a set
of data is useful.
-Latency- the time delay experienced by a
system before data is available for use.
-Traditional systems reaching limits
Differences between SAP HANA
and Oracle TimesTen
-Oracle TimesTen
-Partial in-memory database
-More of a huge cache system
-better system of concurrency for users
-SAP HANA
-Fully functioning in-memory database
-Uses column based storage
-Can use multithreading for individual
queries
http://www.youtube.com/watch?v=4vKOb6HAz-
SAP HANA
Feature
Benefit
Multicore CPU
Large memory footprint
Greater computation power
Faster than disk
Row and column store
Faster aggregation with column store
Compression
Highly dependent on actual data used
Partitioning
Analysis of large data sets
Complex computations
No aggregate tables
Nonmaterialized views
Flexible modeling
No data duplication
Insert only on delta
Fast data loads
Oracle TimesTen
Features
Benefits
Low latency
Microsecond response time
Multi-user concurrency
Durability and Persistence
Transactional parallel replication
Supports SQL and PL/SQL via
ODBC, JDBC, ODP.NET, OCI
and Pro*C/C++
Real time performance
Consistent response time
Automated database failover
Zero data loss
Supports OLTP and analytic
workloads
Hardware
-Both Oracle TimesTen and SAP HANA use
the same hardware
-Intel CPU
-40 Cores
-1 TB of RAM
So In Conclusion...
•
•
•
•
In Memory Computing has become a big deal
because the prices of RAM have fallen in recent
years.
Main memory is much faster than disk storage,
so database design can be different.
Although still more expensive than traditional
design, In Memory Database becomes very
efficient with large amounts of data, especially
when using it for large-scale analysis.
For now, SAP is ahead of the pack in IMDS.