Download Innovations in Business Intelligence Database Technology

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Big data wikipedia , lookup

Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

Object-relational impedance mismatch wikipedia , lookup

Database model wikipedia , lookup

Transcript
Whitepaper
Innovations in
Business Intelligence
Database Technology
www.sisense.com
The State of Database
Technology in 2015
Database technology has seen rapid developments in the past two
decades. Online Analytical Processing (OLAP and its derivatives,
MOLAP, ROLAP and HOLAP), which gained prominence in the 1990s,
gradually lost altitude in favor of in-memory databases at the start of
the 21st century.
However, the requirements of modern business intelligence have set
a challenge that in-memory databases will have a very difficult time
responding to. This, in turn, has brought on the next generation of
databases and querying – in-chip analytics. This newly developed
technology makes use of the CPU, RAM and disk storage in innovative
ways in order to tackle the complexity and size of data sets that
current BI software is forced to handle in order to provide effective
insights to end users at a reasonable timeframe.
This guide will cover:
OLAP Cubes history and overview
In-memory databases advantages and shortcomings
In-chip technology – development, overview and promise
www.sisense.com
OLAP Cubes
Summary
OLAP technology provided a great basis for business
intelligence 20 years ago, but suffers from several limitations
which make it a less than ideal fit for most modern BI projects. It
allows users to receive quick answers to specific pre-defined
queries but is resource intensive and problematic when it comes
to larger data sets and ad-hoc querying.
Leading Provider:
Oracle
Pros
Centralized data integration
Fast data retrieval for specific queries
Cons
Resource intensive
Inflexibility, limited support for ad-hoc queries
Long build times
www.sisense.com
Overview
History
OLAP is a database technology was first developed in the late 1960s,
but only gained widespread commercial use in the 1990s with
Microsoft’s first release of their OLAP Services product (now Analysis
Services), based on technology acquired from Panorama Software.
At that point in time, when computer hardware wasn’t nearly as
powerful as it is today, OLAP was groundbreaking. It introduced a
spectacular way for business users (typically analysts) to easily
perform multidimensional analysis of large volumes of business data.
When Microsoft’s Multidimensional Expressions language (MDX)
came closer to becoming a standard, more and more client tools (e.g.,
Panorama NovaView, ProClarity) started popping up to provide even
more power to these users.
How it Works
An OLAP database converts table based datasets into multidimensional arrays called Cubes in order to optimize querying and
data retrieval. Users can then access specific dimensions of the data
for analysis purposes.
For a simplified example, let’s think of a chain of pet stores that tracks
sales of various items across cities and over time. It might track these
figures in a series of spreadsheets such as these:
www.sisense.com
Whereas in an OLAP cube, the same information would be stored
multi-dimensionally:
Note that this illustration is
somewhat over-simplified. In reality
there can be a virtually endless
amount of dimensions, which are
not necessarily symmetrical.
To answer queries, an OLAP cube typically includes roll-up cells which
contain aggregated data, according to certain perimeters (in our
example, sales over time, or item sales by location). These
aggregations are pre-calculated when the system is “at rest” (i.e. not
being used by end-users).
www.sisense.com
Thus, once a query is made, the answer is already within the data cube
and retrieved instantaneously. However, OLAP cubes have their
drawbacks, the main ones being:
Each additional query requires a new dimension to be added to the
cube, which means duplicating the entire cube in terms of data
storage. This means that OLAP databases quickly become
resource intensive when it comes to data storage and
management.
Aggregating data requires the CPU to process every cell of the
data, which means that each new build (such as when additional
data is added) takes a relatively long time to produce.
OLAP cubes are very fast when it comes to specific, pre-designed
queries. However if a user wants to make a NEW query (e.g., avg.
sales of hamsters-per-year), this data is not pre-calculated and will
require additional dimensions to be added to the cube – a lengthy
process.
www.sisense.com
In-Memory Databases
Summary
In-memory technology – i.e., loading the entire database
into RAM and from there transferring it to the CPU to perform
calculations – has become a leading solution for business
intelligence, as it provides users with the ability to receive fast
answers to their queries, without the need for lengthy builds and
pre-calculations; but the size and complexity of modern data is
forcing in-memory databases to face their limitations.
Leading Provider:
Qlik
Pros
Fast data retrieval
Support for ad-hoc queries
Cons
Expensive to implement and maintain
Scalability issues
Overview
History
In-memory databases became popular in the start of the 21st century
with the proliferation of cheap and widely available 64-bit PCs and the
www.sisense.com
adoption of columnar databases as an alternative to the row-based
systems which were the basis for OLAP cubes.
More RAM on a PC meant that more data can be quickly queried. If
crunching a million rows of data on a machine with only 2GB of RAM
was a drag, users could now add more gigabytes of RAM to their PCs
and store data in relational databases which could be queried much
faster than before.
In-memory databases have become much more prominent in recent
years. However OLAP-based solutions can still be found in massive
organization-wide implementations.
How it Works
Generally speaking, a computer has two types of data storage
mechanisms – disk (often called a hard disk) and RAM (random access
memory).
The important differences between them are outlined in the following
table:
DISK
RAM
Abundant
Scarce
Slow
Fast
Cheap
Expensive
Long term
Short term
Most modern computers have 15-100 times more available disk
storage than they do RAM.
www.sisense.com
However, reading data from disk is much slower than reading the
same data from RAM. This is one of the reasons why 1GB of RAM
costs approximately 320 times that of 1GB of disk space.
In a disk-based RDBMS, there are two things that cause heavy disk
operations and therefore poor performance:
1. Table Scans: Loading of an entire table from disk to RAM (for
calculations)
2. Complex Data: Querying data scattered across many tables
and/or fields (joins)
In-memory technology aims to address both these issues by preloading the entire database into RAM, and loading data from RAM to
the CPU to perform calculations and data retrieval.
All In-memory technologies share the same premise: that it is simply
much faster to perform calculations over data that is stored in RAM
than it is when that same data is stored in a table on a disk. These
technologies also benefit from the fact that 64-bit computers are
currently considered commodity hardware. Additionally, it is relatively
cheaper to add more RAM to both commodity and proprietary
hardware today than it previously was.
www.sisense.com
Illustration: Disk/RAM utilization when querying 2 fields
This technology enables a much faster time to value and significantly
less effort and money invested in developing, setting up and
maintaining analytics infrastructure.
The problem
In-memory technology performs beautifully, at small scales. When
datasets are simple and small, it enables speedy development
compared to a solution built on top of an RDBMS.
However, its main inhibitor to wide enterprise adoption has been
scalability. The challenge it continues to face is that RAM, when used
to store and analyze raw business data, tends to run out quickly and
unexpectedly. As storage sizes go, RAM is tiny and many data sets
www.sisense.com
these days are too large to fit. Moreover, each query to the database
uses up additional RAM for intermediate calculations.
Complex scenarios still require that data be extensively modified, or
even loaded into an RDBMS data warehouse, prior to being loaded
into the memory-based storage. This can happen when data sets are
complex and/or when there are many users querying the database
simultaneously and repeatedly. In such cases, the added value of
such technology is debatable and cost-saving benefits of using it
become less significant.
The fact of the matter is, data sets are getting bigger and bigger, with
companies generating more information than ever – both from
internal sources and from external ones which business executives
look to in order to gain a competitive advantage. This exponential
growth in the size of data has not been mirrored by a similar reduction
in RAM prices – while it is indeed cheaper than it was 15 years ago, it’s
still relatively expensive storage that cannot be scaled indefinitely
without procuring significant costs.
And so, at this point in time it seems that in-memory technology might
just have hit its glass ceiling, and can no longer promise reasonable
performance considering the amounts and complexity of the data that
is currently being gathered, aggregated and analyzed by modern
businesses.
www.sisense.com
ElastiCubes and In-Chip Analytics
Summary
In-Chip Technology is the latest development in database
technology. It combines the flexibility of in-memory based
querying with the speed and robustness of OLAP cubes, without
the hardware costs and difficult implementation of traditional
solutions. Although only recently developed and released, InChip is quickly gaining popularity due to its increased
performance and ability to tackle complex and large data sets.
Leading Provider:
Sisense
Advantages
Fastest data retrieval
Does not require proprietary hardware or extensive RAM
Full support for ad-hoc queries
Overview
History
You might not have heard of ElastiCubes In-Chip Technology yet, as it
has only been released for commercial use a few short years ago.
However it has already become the data analytics platform of choice
for such companies as eBay, Samsung and NASA and is growing
www.sisense.com
rapidly as an alternative and solution to the limitations imposed by
traditional OLAP database technologies.
ElastiCube is a unique form of database developed by SIsense, the
result of thoroughly analyzing the strengths and weaknesses of both
OLAP and in-memory technologies, while taking into consideration
the off-the-shelf hardware of today and tomorrow.
The vision was to provide a true alternative to OLAP technology,
without compromising the speediness of the development cycle and
query response times for which in-memory technologies are lauded.
This would allow a single technology to be used in BI solutions of any
scale, in any industry.
How it Works
In-Chip Analytics is the latest generation of in-memory technology for
business analytics and sets itself apart by being fast as well as
scalable. The name ElastiCube comes from the database’s unique
ability to stretch beyond the hard limitations imposed by older
generation technologies.
This technology employs a disk-based columnar database for storage
to provide fast disk reads and is able to load data from disk to RAM
(and vice versa) when is needed. The queries themselves are
processed entirely in-memory without any disk-reads throughout.
And most importantly, there is only a subset of the data physically
stored in RAM at any given time, leaving more space for other
operations to take place in parallel – in other words, RAM limitations
are not as big an issue as with previous in-memory technologies, as
there is no need to keep the entire data in RAM on a permanent basis.
www.sisense.com
This is achieved via advanced compression as well as identification of
the parts of the dataset which are not being used on a regular basis
and can be left “at rest” – typically this is around 80 percent of the
data businesses collect.
In-Chip Technology also has a unique way of handling joins. Instead
of joining tables, it uses columnar algebra to merge between fields.
This way, the join operation can be processed entirely in the CPU
cache.
Illustration: Disk/RAM utilization when querying 2 fields
The table below compares between RDBMS technology, In-Memory
technology and Sisense’s In-Chip Technology by a set of several
technical aspects:
Columnar Storage: whether the technology supports storage of
columns rather than tables.
www.sisense.com
In-Memory Query Processing: whether the technology typically
requires reads from disk during query execution
Performance Upon Installation: Fast query response to queries
involving joining, grouping and aggregating data – without lengthy
preparation work or specialized configuration.
Data Capacity: Is there a cap on data capacity beyond what can be
stored on a single hard disk (TBs of data).
Scalability Level: The ability of the technology to support growing
data volumes and concurrent usage without having to significantly
modify/re-build the solution.
Feature
RDBMS
In-Memory
Associative
In-Chip
Technology
Columnar Storage
Some
No
Yes
No
Yes
Yes
Slow
Fast
Fast
Data Capacity
Unlimited
Limited (by
size and
RAM)
Unlimited
Scalability Level
Large scale
Small scale
Small /
Large scale
In Memory Query
Processing
Performance
Upon Installation
In-Chip technology further optimizes data processing by making the
most of the built-in components of today’s 64-bit commodity
hardware. Using algorithms that run beneath the OS and replace its
set of instructions, In-Chip manages to utilize the CPU to its fullest,
thus achieving unparalleled performance rates – even on huge,
complex data sets that would previously have required massive
hardware upgrades to even consider handling.
www.sisense.com
Illustration: Latencies of CPU cache, RAM and disk storage
Summary: The Future of
Databases?
We’ve reviewed three major database technologies employed by BI
software in the past few decades: OLAP cubes, in-memory databases,
and up and coming In-Chip Analytics.
As we have seen, both OLAP and in-memory technology suffer from
scalability issues, and there are significant doubts as to their ability to
provide a reasonable solution for the requirements of 21st century
business intelligence, in terms of data size, complexity, and cost to
implement.
In-Chip Technology is currently the most advanced way to store and
query data in rapidly changing business environments, and is
www.sisense.com
expected to be adopted by more and more companies in coming
years.
Want to learn more about In-Chip technology?
Visit sisense.com
Join a Sisense Analytics Expert for a Weekly Live Demo of In-Chip
technology at work
Questions, notes, or comments on the contents of this document? We’d love
to hear them! Contact us
www.sisense.com