Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Whitepaper Innovations in Business Intelligence Database Technology www.sisense.com The State of Database Technology in 2015 Database technology has seen rapid developments in the past two decades. Online Analytical Processing (OLAP and its derivatives, MOLAP, ROLAP and HOLAP), which gained prominence in the 1990s, gradually lost altitude in favor of in-memory databases at the start of the 21st century. However, the requirements of modern business intelligence have set a challenge that in-memory databases will have a very difficult time responding to. This, in turn, has brought on the next generation of databases and querying – in-chip analytics. This newly developed technology makes use of the CPU, RAM and disk storage in innovative ways in order to tackle the complexity and size of data sets that current BI software is forced to handle in order to provide effective insights to end users at a reasonable timeframe. This guide will cover: OLAP Cubes history and overview In-memory databases advantages and shortcomings In-chip technology – development, overview and promise www.sisense.com OLAP Cubes Summary OLAP technology provided a great basis for business intelligence 20 years ago, but suffers from several limitations which make it a less than ideal fit for most modern BI projects. It allows users to receive quick answers to specific pre-defined queries but is resource intensive and problematic when it comes to larger data sets and ad-hoc querying. Leading Provider: Oracle Pros Centralized data integration Fast data retrieval for specific queries Cons Resource intensive Inflexibility, limited support for ad-hoc queries Long build times www.sisense.com Overview History OLAP is a database technology was first developed in the late 1960s, but only gained widespread commercial use in the 1990s with Microsoft’s first release of their OLAP Services product (now Analysis Services), based on technology acquired from Panorama Software. At that point in time, when computer hardware wasn’t nearly as powerful as it is today, OLAP was groundbreaking. It introduced a spectacular way for business users (typically analysts) to easily perform multidimensional analysis of large volumes of business data. When Microsoft’s Multidimensional Expressions language (MDX) came closer to becoming a standard, more and more client tools (e.g., Panorama NovaView, ProClarity) started popping up to provide even more power to these users. How it Works An OLAP database converts table based datasets into multidimensional arrays called Cubes in order to optimize querying and data retrieval. Users can then access specific dimensions of the data for analysis purposes. For a simplified example, let’s think of a chain of pet stores that tracks sales of various items across cities and over time. It might track these figures in a series of spreadsheets such as these: www.sisense.com Whereas in an OLAP cube, the same information would be stored multi-dimensionally: Note that this illustration is somewhat over-simplified. In reality there can be a virtually endless amount of dimensions, which are not necessarily symmetrical. To answer queries, an OLAP cube typically includes roll-up cells which contain aggregated data, according to certain perimeters (in our example, sales over time, or item sales by location). These aggregations are pre-calculated when the system is “at rest” (i.e. not being used by end-users). www.sisense.com Thus, once a query is made, the answer is already within the data cube and retrieved instantaneously. However, OLAP cubes have their drawbacks, the main ones being: Each additional query requires a new dimension to be added to the cube, which means duplicating the entire cube in terms of data storage. This means that OLAP databases quickly become resource intensive when it comes to data storage and management. Aggregating data requires the CPU to process every cell of the data, which means that each new build (such as when additional data is added) takes a relatively long time to produce. OLAP cubes are very fast when it comes to specific, pre-designed queries. However if a user wants to make a NEW query (e.g., avg. sales of hamsters-per-year), this data is not pre-calculated and will require additional dimensions to be added to the cube – a lengthy process. www.sisense.com In-Memory Databases Summary In-memory technology – i.e., loading the entire database into RAM and from there transferring it to the CPU to perform calculations – has become a leading solution for business intelligence, as it provides users with the ability to receive fast answers to their queries, without the need for lengthy builds and pre-calculations; but the size and complexity of modern data is forcing in-memory databases to face their limitations. Leading Provider: Qlik Pros Fast data retrieval Support for ad-hoc queries Cons Expensive to implement and maintain Scalability issues Overview History In-memory databases became popular in the start of the 21st century with the proliferation of cheap and widely available 64-bit PCs and the www.sisense.com adoption of columnar databases as an alternative to the row-based systems which were the basis for OLAP cubes. More RAM on a PC meant that more data can be quickly queried. If crunching a million rows of data on a machine with only 2GB of RAM was a drag, users could now add more gigabytes of RAM to their PCs and store data in relational databases which could be queried much faster than before. In-memory databases have become much more prominent in recent years. However OLAP-based solutions can still be found in massive organization-wide implementations. How it Works Generally speaking, a computer has two types of data storage mechanisms – disk (often called a hard disk) and RAM (random access memory). The important differences between them are outlined in the following table: DISK RAM Abundant Scarce Slow Fast Cheap Expensive Long term Short term Most modern computers have 15-100 times more available disk storage than they do RAM. www.sisense.com However, reading data from disk is much slower than reading the same data from RAM. This is one of the reasons why 1GB of RAM costs approximately 320 times that of 1GB of disk space. In a disk-based RDBMS, there are two things that cause heavy disk operations and therefore poor performance: 1. Table Scans: Loading of an entire table from disk to RAM (for calculations) 2. Complex Data: Querying data scattered across many tables and/or fields (joins) In-memory technology aims to address both these issues by preloading the entire database into RAM, and loading data from RAM to the CPU to perform calculations and data retrieval. All In-memory technologies share the same premise: that it is simply much faster to perform calculations over data that is stored in RAM than it is when that same data is stored in a table on a disk. These technologies also benefit from the fact that 64-bit computers are currently considered commodity hardware. Additionally, it is relatively cheaper to add more RAM to both commodity and proprietary hardware today than it previously was. www.sisense.com Illustration: Disk/RAM utilization when querying 2 fields This technology enables a much faster time to value and significantly less effort and money invested in developing, setting up and maintaining analytics infrastructure. The problem In-memory technology performs beautifully, at small scales. When datasets are simple and small, it enables speedy development compared to a solution built on top of an RDBMS. However, its main inhibitor to wide enterprise adoption has been scalability. The challenge it continues to face is that RAM, when used to store and analyze raw business data, tends to run out quickly and unexpectedly. As storage sizes go, RAM is tiny and many data sets www.sisense.com these days are too large to fit. Moreover, each query to the database uses up additional RAM for intermediate calculations. Complex scenarios still require that data be extensively modified, or even loaded into an RDBMS data warehouse, prior to being loaded into the memory-based storage. This can happen when data sets are complex and/or when there are many users querying the database simultaneously and repeatedly. In such cases, the added value of such technology is debatable and cost-saving benefits of using it become less significant. The fact of the matter is, data sets are getting bigger and bigger, with companies generating more information than ever – both from internal sources and from external ones which business executives look to in order to gain a competitive advantage. This exponential growth in the size of data has not been mirrored by a similar reduction in RAM prices – while it is indeed cheaper than it was 15 years ago, it’s still relatively expensive storage that cannot be scaled indefinitely without procuring significant costs. And so, at this point in time it seems that in-memory technology might just have hit its glass ceiling, and can no longer promise reasonable performance considering the amounts and complexity of the data that is currently being gathered, aggregated and analyzed by modern businesses. www.sisense.com ElastiCubes and In-Chip Analytics Summary In-Chip Technology is the latest development in database technology. It combines the flexibility of in-memory based querying with the speed and robustness of OLAP cubes, without the hardware costs and difficult implementation of traditional solutions. Although only recently developed and released, InChip is quickly gaining popularity due to its increased performance and ability to tackle complex and large data sets. Leading Provider: Sisense Advantages Fastest data retrieval Does not require proprietary hardware or extensive RAM Full support for ad-hoc queries Overview History You might not have heard of ElastiCubes In-Chip Technology yet, as it has only been released for commercial use a few short years ago. However it has already become the data analytics platform of choice for such companies as eBay, Samsung and NASA and is growing www.sisense.com rapidly as an alternative and solution to the limitations imposed by traditional OLAP database technologies. ElastiCube is a unique form of database developed by SIsense, the result of thoroughly analyzing the strengths and weaknesses of both OLAP and in-memory technologies, while taking into consideration the off-the-shelf hardware of today and tomorrow. The vision was to provide a true alternative to OLAP technology, without compromising the speediness of the development cycle and query response times for which in-memory technologies are lauded. This would allow a single technology to be used in BI solutions of any scale, in any industry. How it Works In-Chip Analytics is the latest generation of in-memory technology for business analytics and sets itself apart by being fast as well as scalable. The name ElastiCube comes from the database’s unique ability to stretch beyond the hard limitations imposed by older generation technologies. This technology employs a disk-based columnar database for storage to provide fast disk reads and is able to load data from disk to RAM (and vice versa) when is needed. The queries themselves are processed entirely in-memory without any disk-reads throughout. And most importantly, there is only a subset of the data physically stored in RAM at any given time, leaving more space for other operations to take place in parallel – in other words, RAM limitations are not as big an issue as with previous in-memory technologies, as there is no need to keep the entire data in RAM on a permanent basis. www.sisense.com This is achieved via advanced compression as well as identification of the parts of the dataset which are not being used on a regular basis and can be left “at rest” – typically this is around 80 percent of the data businesses collect. In-Chip Technology also has a unique way of handling joins. Instead of joining tables, it uses columnar algebra to merge between fields. This way, the join operation can be processed entirely in the CPU cache. Illustration: Disk/RAM utilization when querying 2 fields The table below compares between RDBMS technology, In-Memory technology and Sisense’s In-Chip Technology by a set of several technical aspects: Columnar Storage: whether the technology supports storage of columns rather than tables. www.sisense.com In-Memory Query Processing: whether the technology typically requires reads from disk during query execution Performance Upon Installation: Fast query response to queries involving joining, grouping and aggregating data – without lengthy preparation work or specialized configuration. Data Capacity: Is there a cap on data capacity beyond what can be stored on a single hard disk (TBs of data). Scalability Level: The ability of the technology to support growing data volumes and concurrent usage without having to significantly modify/re-build the solution. Feature RDBMS In-Memory Associative In-Chip Technology Columnar Storage Some No Yes No Yes Yes Slow Fast Fast Data Capacity Unlimited Limited (by size and RAM) Unlimited Scalability Level Large scale Small scale Small / Large scale In Memory Query Processing Performance Upon Installation In-Chip technology further optimizes data processing by making the most of the built-in components of today’s 64-bit commodity hardware. Using algorithms that run beneath the OS and replace its set of instructions, In-Chip manages to utilize the CPU to its fullest, thus achieving unparalleled performance rates – even on huge, complex data sets that would previously have required massive hardware upgrades to even consider handling. www.sisense.com Illustration: Latencies of CPU cache, RAM and disk storage Summary: The Future of Databases? We’ve reviewed three major database technologies employed by BI software in the past few decades: OLAP cubes, in-memory databases, and up and coming In-Chip Analytics. As we have seen, both OLAP and in-memory technology suffer from scalability issues, and there are significant doubts as to their ability to provide a reasonable solution for the requirements of 21st century business intelligence, in terms of data size, complexity, and cost to implement. In-Chip Technology is currently the most advanced way to store and query data in rapidly changing business environments, and is www.sisense.com expected to be adopted by more and more companies in coming years. Want to learn more about In-Chip technology? Visit sisense.com Join a Sisense Analytics Expert for a Weekly Live Demo of In-Chip technology at work Questions, notes, or comments on the contents of this document? We’d love to hear them! Contact us www.sisense.com