Download Optimizing Large Data Handling in SAP ASE for Better

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

IMDb wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Relational model wikipedia , lookup

Functional Database Model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

SAP IQ wikipedia , lookup

Transcript
Optimizing Large Data Handling in SAP® ASE for
Better Performance
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
TABLE OF CONTENTS
Overview..................................................................................................................................3
SAP Adaptive Server Enterprise and VLDB..........................................................................3
Overhauling SAP ASE to Prepare for VLDB.........................................................................3
Optimizer Rewrite...............................................................................................................3
Data Storage & Devices.................................................................................................... 4
Multiple Tempdbs & RLC.................................................................................................. 4
Data & Index Partitioning...................................................................................................5
Partitions & Database Maintenance................................................................................ 6
Partitions & Query Parallelism......................................................................................... 6
Introduce New Techniques to Improve Performance..........................................................7
In-memory database..........................................................................................................7
Improved Join Methods – Enhance Star Joins............................................................... 8
Better Handling of LOB Data Providing Improved Performance................................... 8
Online Index Creation........................................................................................................ 9
Enhanced Insert-select Performance.............................................................................. 9
Storage................................................................................................................................... 9
Data Compression/In-row LOB Compression................................................................ 9
Backup Compression – Compressing a Dump............................................................... 9
Deferred table creation....................................................................................................10
Scale Up................................................................................................................................10
Metadata and Latch Management Enhancements.......................................................10
Lock Management Enhancements.................................................................................10
Run-time Logging Enhancements...................................................................................11
Summary................................................................................................................................11
2
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
OVERVIEW
Data is becoming the life-line of any organization that would like to maintain its competitive advantage in the industry.
In today’s fast-paced and competitive world, insights derived from data enables organizations to make the right decisions
at the right time, optimize their operations, and provide better products and services to customers. Traditional database
growth has been attributed to various factors, including indexes that are built in the database, backups that are taken and
new compliance requirements, such as Sarbanes Oxley and HIPAA, that warrant the need for retaining historical data.
With large amounts of data constantly flowing into the organization, IT departments have to ensure that there is sufficient
storage and processing power, while DBAs have to make sure the database is well designed and tuned to allow queries
to run fast, without encountering storage related bottlenecks or other issues. These organizations are more and more
depending on the database provider for optimizing the database and getting better performance within a stable database
environment.
SAP ADAPTIVE SERVER ENTERPRISE AND VLDB
SAP Adaptive Server® Enterprise (ASE) offers a versatile high performance enterprise-class relational database
management system that is well suited for mission-critical, data-intensive environments. It is designed to handle the
most demanding OLTP (online transaction processing) environments and has refined capabilities, among non-business
warehouse products, to address decision support and analytical workloads.
When considering the characteristics of Very Large Database (VLDB), the following often stand out as key requirements,
in addition to the simplistic “large” storage one:
•• High user concurrency/high transaction concurrency;
•• Highly or continuous availability;
•• User concurrency even during bulk operations, such as bulk feeds or archival processing;
•• Large/Very Large SMP servers with massive amounts of memory and CPU core counts;
•• Complex query optimization and parallel query features for operational reporting.
OVERHAULING SAP ASE TO PREPARE FOR VLDB
Although SAP ASE had done very well in the performance and management area with its support for row-level locking,
dynamic memory allocation and distributed joins, It showed some limitations in the way it handled very large databases.
To address these concerns, SAP has progressively introduced new functionalities changing the way SAP ASE handles
large databases. Starting with SAP ASE 15, a complete rewrite of the optimizer along with the introduction of partitioning
and improvements in parallelism helped alleviate some of the performance problems that were encountered when
executing complex queries on very large data sets.
Optimizer Rewrite
SAP ASE 15saw a major improvement as the optimizer went through a complete overhaul. The old pre-15 optimizer was
focused on OLTP and had limited functions for analytics oriented query optimization techniques and no effective means
to control the optimizer beyond simplistic index and join order forcing. As a result, the SAP ASE 15 optimizer added a
number of new features, including:
•• Configurable optimizer controls and optimization goals that could be set on a server-wide, session, or query basis;
•• Query level detailed query execution plan syntax that controlled which optimization and execution strategies would be
used — beyond the simplistic index and join order forcing;
•• Better and more effective support for parallel threads, eliminating the thread “explosion” found in previous releases,
increasing the number of concurrent users who could benefit from parallel query;
•• Improved query optimization diagnostics and statistics;
•• Additional join and group by processing techniques which were more adapted to reporting type applications
•• In-memory sorting techniques to improve overall query response times.
•• Improved costing algorithms for providing better index selection or join orders and strategies;
3
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
The most noticeable change was delivered by the optimizer controls and optimization goals. SAP ASE now has
optimization goals, which provide hints to the optimizer to choose the right query plan for executing the query. Here are
the optimization goals, which can be set at a server, session or query level:
•• allrows_mix – the default goal, as well as the most useful goal in a mixed-query environment - balances the needs of
OLTP and DSS query environments;
•• allrows_oltp – the most useful goal for purely OLTP queries;
•• allrows_dss – the most useful goal for operational DSS queries of medium-to-high complexity.
These optimization goals were at a higher level of abstraction. At a lower level, SAP ASE provided optimization controls
that allowed users to selectively disable different join, grouping, distinct processing, sort strategies or other specific query
execution methods, as desired. Alternatively, they could be enabled for an optimization goal, which normally wouldn’t
use such techniques, As an example, a DSS style hash-join could be enabled for an application using an allrows_oltp
optimization goal. These controls could be enabled for a session or for an individual query.
Along with these improvements, SAP ASE optimizer provided cost-based pruning mechanisms to use estimated costs of
sub-plans and avoid analyzing expensive sub-plans. Throughout the remainder of the SAP ASE 15.x platform lifecycle,
the optimizer was continually improved to the point that shortly after announcing support for SAP applications, SAP
ASE quickly began claiming benchmark records with the standard SAP SD Benchmark. SAP ASE was outperforming
competitors’ solutions, despite their more than 20 years of working with SAP applications and their numerous optimizer
hints, implemented specifically to support an ERP application.
Data Storage & Devices
As the data volume increases, the database size increases. This translates to a larger number of devices that needs to
be used, since their number is dependent on the size of the devices that can be configured. Prior to SAP ASE 15, SAP
ASE could only support up to 256 devices of 32GB each. In addition, each database was limited to two billion pages,
resulting in a theoretical database size limit of 32TB. In reality, however, because of the 256 devices of 32GB limit, the real
database size was restricted to 8TB. With many VLDBs in the tens of TB, the 8TB limit posed a challenge.
The first change in SAP ASE 15 was to increase the number of devices from 255 to 2 billion (2,147,483,648) and the
maximum size of a database device from 32GB to 4TB. This would allow a single database to hit the theoretical limit of
32TB based on two billion pages of 16KB. The second change came when unsigned integer types were added and the
page id was treated as an unsigned integer. That meant a single database could contain four billion pages, for a total of
64TB and the server could support over one Exabyte of data, among multiple databases.
These changes were a good first step, but a number of major real issues still needed to be tackled to effectively handle
large data volumes. These major issues included the ability to perform maintenance on such large systems and
controlling parallelism.
Multiple Tempdbs & RLC
Temporary databases in SAP ASE have long been used for a number of reasons. First, SAP ASE used tempdb for
worktables related to query processing for executing complex queries or when sorting was necessary. Secondly,
developers often used tempdb to store interim result sets when breaking up complex large processing steps into small
discrete steps that could more easily be implemented.
Initially, SAP ASE only supported a single tempdb. This quickly became a bottleneck under high concurrency, simply due
to the log semaphore contention. It also caused a single point of failure in large OLTP VLDB systems. If a user working
with a large table generated an interim result, it could fill the available tempdb space, bringing all processing to an
immediate halt. As a result, prior to SAP ASE 15, SAP ASE implemented multiple tempdbs, supporting a single system
tempdb and multiple user defined tempdbs. When using multiple tempdbs, SAP ASE allowed DBAs to define tempdb
‘groups’ or subsets of multiple tempdbs. Individual users or applications could then be bound to the different tempdb
groups. Each session would be round-robin assigned to one of the tempdbs that was a member of that group. In doing
so, each application could be isolated from the impacts of other applications and also different users could be isolated
from the impacts of others. If a single tempdb was filled, only those users currently assigned to that tempdb would be
4
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
affected. Of course, SAP ASE also introduced a tempdb space resource governor that allowed DBAs to put limits on how
much tempdb space a single user could consume. When the limit was reached the DBA could either kill that query or warn
the user about the limits before the query would affect other users.
While useful, the implementation of multiple tempdbs was still not enough to completely address the needs of VLDBs
and further enhancements were implemented. Prior to SAP ASE 15, the tables sometimes used page locks even if the
server’s default locking scheme was set to row-level locking. Page level catalog locking caused a huge problem for all
types of applications, but especially for OLTP VLDB’s. Most OLTP VLDB’s had large user populations and were used in a
mixed workload fashion, supporting both for pure online transactions as well as operational reporting. The mix of high
concurrency and complex queries resulted in a lot of lock contention in tempdb. The use of multiple tempdbs helped, as
the contention could be spread among multiple instances, however, in some VLDB scenarios, the remaining contention
was such that deployment sites were using tens of tempdbs in an attempt to reduce the catalog contention.
In addition, when a procedure needed to be re-resolved in systems with a lot of stored procedures or views, SAP ASE
had to update system tables in the user database with new procedure tree information. These tables became a source of
conflict under high concurrent usage, which is typical of OLTP VLDBs.
To address these concerns, in SAP ASE 15, the SAP ASE system tables moved from page level locks to row level locks for
all the system catalog tables. Row Locked Catalogs (RLC), as this was called, resolved not only the tempdb contention,
but also the contention on sysprocedures in user databases. In addition, it aided in some database maintenance activities
as concurrent update statistics or other similar commands used to block on systabstats/sysstatistics causing a real
problem for VLDB as DBAs used to have to run admin commands one table at a time on the larger tables.
Data & Index Partitioning
One of the nice features that were introduced to SAP ASE in order to handle very large databases is the table partitioning.
As the table size grows, it becomes more and more cumbersome to query and manage the table. Partitions in SAP ASE
help by allowing the DBA to divide the tables and indexes into smaller and more manageable chunks. This allows for
faster and easier data access and efficient manageability in terms of maintenance activities. SAP ASE supports horizontal
partitioning where a specific set of table rows can be distributed to different partitions, which can practically be on
different disk devices.
The following are the partitioning strategies offered in SAP ASE:
•• Hash partitioning (semantic) – a system-supplied hash function determines the partition assignment for each row;
•• List partitioning (semantic) – values in key columns are compared with sets of user-supplied values specific to each
partition. Exact matches determine the partition assignment;
•• Range partitioning (semantic) – values in key columns are compared with a user-supplied set of upper and lower
bounds associated with each partition. Key column values falling within the stated bounds determine the partition
assignment;
•• Round-robin partitioning – rows are assigned randomly to partitions in a round-robin manner so that each partition
contains a more or less equal number of rows. This is the default strategy.
Prior to SAP ASE 15, SAP ASE only supported round robin partitioning. Initially, this was done to alleviate contention on
high insert heap tables (or tables with clustered indexes using monotonic sequences) due to contention experienced on
the last page by multiple concurrent sessions. While this helped high speed OLTP systems, it proved to be a pain point
for VLDB operations in several ways. First, it was not possible to control the partition where bulk feeds were being loaded
into, and archives could not reliably predict in which partition order data resided. In addition, since the data could be in
any partition, parallel query methods inefficiently ended up spawning MxN threads when performing joins in parallel.
SAP ASE 15 added semantic partitioning in which data placement was based on selected column values instead of the
session. Using semantic partitioning on tables proved to be beneficial for reducing maintenance time and improving
parallel query efficiency.
5
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
Partitions & Database Maintenance
It is an unfortunate reality that database tables require both logical and physical maintenance. From a logical level, older
rows may need to be archived or daily bulk feeds need to be loaded. From a physical perspective, the tables need to have
index statistics maintained and possibly reorganized, due to table fragmentation or other attributes. With VLDB systems,
maintenance on a single large glob of a table is grossly inefficient. Most of the data, probably 90% of it or more, is not
changing because of either logical (archive/bulk feeds) or physical (update stats/reorg) maintenance. Yet some of the
physical operations would move that static 90% of data around, or scan it for statistics updates, while archive processes
would perform huge deletes or similar operations on it.
By partitioning a table, the maintenance could be made much easily. First, physical maintenance commands such
as reorg or update statistics could be run on a single partition, reducing the time and impact by multiple orders of
magnitude. As an example, a single update statistics command that would have run for eight hours on an un-partitioned
table could now run in less than 30 seconds, on just the partitions that required it. Second, logical operations such
as archive could leverage truncate partition as opposed to using huge deletes that would have taken much longer.
Additionally, bulk feeds could be targeted more precisely at the correct partitions that were impossible to achieve with
the round-robin partitioning of pre-15 servers. Archive processes also could target the desired partitions more selectively.
This made large tables more manageable from both a pure database administration standpoint and the business
application perspective.
With the addition of split, merge and move partition capabilities, SAP ASE provided useful tools for improving physical
maintenance. For example, it might be useful to have monthly based partitions for current and previous year’s data.
However, if the data retention requirements demand 10 years of data to be preserved, you quickly end up with 100’s
of partitions, which while working fine may lead to some administrative headaches and query processing overhead.
Using merge partition, partitions in the three to five year old range could be merged into quarterly partitions while data
partitions older than 5 years could be merged into yearly partitions. In addition, as the data ages, it could be moved to
lower cost storage, as it might not be accessed as often and the performance of lower cost storage may be adequate for
those requirements.
Partitioning was further improved in SAP ASE 16 by adding partition locking. Traditionally SAP ASE has supported table
level, page level and row level locks. When there were large modifications within a single partition that exceeded the
page or row lock promotion threshold, SAP ASE often acquired a table lock, which essentially restricts concurrent DML
activities on the entire table, no matter which partition. This was problematic for large inserts from daily bulk feeds would
block users from looking at yesterday’s or previous data. Large deletes as part of archive operations would have a similar
impact. With data flowing in from all directions and in order to make intelligent analysis of this data, there is a need for
quicker and continuous access to data. SAP ASE has introduced partition level locks, which enable lock promotion at a
partition level and allows DDL/DML access to other partitions when performing concurrent activities on a partition in the
same table.
Partitions & Query Parallelism
As mentioned previously, with the pre-15 partitioning, which was based strictly on round-robin partitions, when
performing parallel queries involving joins, there was a veritable explosion of worker processes as each worker process
on the outer table had to spawn additional worker processes for each of the inner table join materialization probes. This
was the dreaded M-x-N thread explosion problem. For example, if a table of four partitions was joined with a table of 16
partitions, the resulting query would often use 48 worker processes. Each of the four worker processes in the outer table
would spawn 16 additional worker threads to scan each of the 16 inner table partitions. This was necessary as without
semantic partitioning, the joining data could be in any one of the partitions. Because only a limited number of CPUcores is available at a given moment and there are limits to the number of worker processes a single core can support,
a single parallel query — or up to three at most — often wiped out all available worker processes and totally consumed
the machine. This would negatively impact other users, even on some of the largest SMP boxes available. Think of it, just
three queries at ~48 threads, each is pushing 150 worker threads, would likely swamp a 48 core host completely.
6
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
With semantic partitioning, a vector parallelism approach could be used, quickly dropping the number of worker
processes from M-x-N to M+N, at the worst case. Let’s take a table with four partitions again joining a table of 16
partitions, both partitioned on the same key. SAP ASE 15 would start off with four worker threads for the outer table as
before. However, due to partition elimination, each of the four worker threads would only need to spawn one additional
worker thread per qualifying partition, based on the partition key. The net result would be the use of a single worker
process per inner table partition as opposed to multiple. Additionally, only a total of 20 worker threads would be used
instead of the 48 used in our previous example. However, this number could be reduced even further. Remember,
partition elimination could also apply to the outer table in the query as well. As a result, the outer table might only use one
or two worker threads more commonly, for a total worker thread consumption of four to eight per query. Now instead of
allowing two to three users to exhaust the 150 worker thread pool, now it would take 20-30 queries to achieve the same
level of resource utilization. This means more users could benefit from query parallelism since the users who are using
parallel query features would have considerably less impact on those who are not.
One additional area of improvement was identified in SAP ASE 15’s use of a static thread assignment model to execute
parallel query plans. This meant that the number of worker threads to be used for a query would be determined at initial
runtime, based on the available threads at that point. If the worker threads were all in use, a query might start executing
serially. If another query finished and freed up some worker threads, the query would continue to run with the statically
assigned execution plan.
In order to cater to the VLDB and large data volumes, SAP ASE now uses a dynamic thread assignment execution model
for parallel query plans. This methodology provides improvement in performance while using a smaller number of worker
threads to execute the query plans in parallel. This could benefit VLDBs in several ways. First, from our earlier example,
once a query finished, our query could start running in parallel. Secondly, from our even earlier example of a four-way
partitioned table and a 16-way partitioned table, each of the outer table worker processes really would only be scanning
a single inner table partition at a time. With dynamic thread assignment, this would mean that each of the inner table
threads would only be assigned for each partition as necessary. This means that rather than M+N (20 threads), our new
maximum would be M*2 or 8 threads. Additionally, with partition elimination resulting in one to two outer partitions, we
will have a maximum of two to four threads.
Another area of enhancement that is bound to provide performance improvement is the concept of dynamic load
balancing between worker threads, during the execution of the parallel query plan. The query plan is subdivided into
smaller chunks and worker threads are assigned to execute these chunks of the query plan. Those worker threads that
complete the smaller chunks of the query plan quickly return back to execute the remaining part of the query plan chunks
to complete the execution.
INTRODUCE NEW TECHNIQUES TO IMPROVE PERFORMANCE
In-memory Database
It is essential for any database to adhere to the ACID properties (Atomic/Consistent/Isolated/Durable) which allows
consistency and ensures easy restorability. However, there are some applications and data that are not very particular
on ensuring that all modifications in a transaction are rolled back (atomicity) and likewise all committed transactions are
persisted (durability). Such applications and data can make use of SAP ASE’s in-memory database offering, which relaxes
its hold on atomicity and durability but that in exchange provides great performance. SAP ASE provided two forms of
in-memory offering:
•• IMDB – In-Memory database;
•• RDDB – Reduced durability database.
In the IMDB offering, SAP ASE supports minimally logged DML. Since all the data is in memory and there is no disk
storage involved, when there is a hardware crash, all data is lost. SAP ASE does provide an option of creating a template
of the database in memory that can be used to recreate the database in a hardware crash situation.
7
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
In the RDDB offering, everything is fully disk-based, although many of the optimizations that were done for IMDB have
been implemented here, giving a different twist to the entire situation. Relaxed durability database exchanges the full
durability of committed transactions for enhanced runtime performance, especially for databases that exceed the memory
capacity or when an in-memory database would be not cost effective.
There are two areas where IMDB and RDDB aided VLDB implementations. The first was IMDB tempdbs, which were used
because VLDB environment typically serve a mix of applications with varying workload and impacts on tempdb. Pure
OLTP applications need extremely fast and short/small tempdb characteristics. Reporting systems tend to work with
larger volumes of data and, as a result, need larger tempdbs. For example, one way in which a lot of VLDBs “grew up”
is that the extremely high transaction rates, caused by electronic feeds coupled with data retention, resulted in what
we sometimes might think of OLTP VLDBs. These VLDBs have an extreme OLTP primary purpose, but are also used for
operational reports.
In these situations using a single style of tempdb often requires DBAs to consider a compromise and choose between
putting tempdb on a cheaper storage to reduce the cost of the larger reporting requirements, or increase costs by placing
tempdb on a more expensive storage. While the multiple tempdbs helped mitigate this compromise, since in many cases
the speed of OLTP transactions were in the eXtreme OLTP (XOLTP) range, attempts to use RAMDisk/tmpfs and other OS
features were commonplace. With the IMDB capability, DBAs could instead create a tempdb completely in memory and
assign it to the XOLTP applications. This actually reduced costs, as the more typical RAMDisk/tmpfs implementations
required double buffering in memory, once in the DBMS cache and once in the OS. It also improved speed, as SAP ASE
could operate on it as fully in-memory instead of having to attempt physical I/O to a file system, whether the data was in
memory or not, with the associated file system restrictions.
The second use case involves RDDB. Often, VLDB systems also have daily bulk feeds that occur, especially from business
partners. These bulk feeds need to be loaded into the database and have business validation rules or data scrubbing
activities run. Then the new data may need to be merged with existing data sets. Rather than performing this load in the
primary business database and cause the extra logging overhead of the data loads as well as scrubbing actions, often
one or more auxiliary databases in the same server would be used as a staging area for the data, until it became ready
to be loaded into the primary database. For example, a hospital might submit a daily batch of billing records to a health
insurance claims processing system. Since the feed is likely in a fairly raw format, there likely would be a series of data
scrubs or transformations that would be necessary to get the data into a format that could easily be joined with the
existing data. The claims would have to be compared to existing members as well as validated for ensuring appropriate
rules are followed, such as pre-authorization. In such systems, often the process uses one or more recovery points. If a
failure or error occurs, the data is simply set back to a recovery point and the processing restarts from there.
Could an IMDB be used for the example above? Possibly — but since that process is only invoked once per day, the
memory set aside for the IMDB would be wasted during the remainder of the day. In addition, the feed could be quite
large, exceeding the amount of free memory that could be used to make an IMDB. Instead, this auxiliary database could
use the RDDB minimal logging to improve the performance of data loads and scrubbing transactions, without the concern
of the recoverability after processing finishes.
Improved Join Methods — Enhance Star Joins
SAP ASE has recently added plan optimization feature called the ‘bloom filter’, which is used to improve join performance.
A bloom filter provides early filtering of non-joinable rows before they would reach the join operator. This enhancement
is expected to reduce the total cost of the join. The bloom filter is implemented for hash join, sort merge join and
reformatting based nested loop join.
Better Handling of LOB Data Providing Improved Performance
In an application environment, LOB data is handled by the application by executing SQLBulkOperations and by using
ODBC batching protocols like SQLExecute/SQLExecDirect. Since data-at-exec is not supported, LOB data must be
fully materialized before executing these protocols. A new API called ‘data-at-exec’ is being provided which will help the
application to avoid materializing the LOB parameters prior to the call to SQLExecute. Instead LOB data will be sent
in chunks and the ODBC driver will convert the data as it is received and hold on to it until all data for the parameter is
received, at which time it will be sent to the server.
8
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
Online Index Creation
It is becoming more and more important for businesses to have high data availability. Traditionally, creating an index on a
large data has been known to cause a significant performance problem in terms of data availability. SAP ASE includes the
create index... online parameter providing the ability to create indexes without blocking access to the data being indexed.
Except for the sorted_data parameter, SAP ASE processes other create index parameters the same way, both with or
without the online parameter.
Enhanced Insert-select Performance
This feature was developed to improve insert select performance in SAP ASE by using bulk mode. It helps address
massive data insert performance for the data-warehousing data loading phase. In order to provide full control to the user,
SAP ASE introduces new optimizer criteria INS_BY_BULK via the existing User Defined Optimizer Goal framework. The
new optimizer criteria let the user to enable and disable this feature based on their need. The feature can be activated or
deactivated at multiple levels – server level, session level, procedure level and query level. This approach gives maximum
flexibility and granular control to choose the appropriate level that fits the application’s needs and does not compromise
the performance of the inserts, where parallelism does not yield good performance.
STORAGE
SAP SAP ASE provides compression functionality that optimizes disk space to better handle the deluge of data coming in
to the database. SAP ASE now supports data compression, LOB compression and backup compression.
SAP ASE provides different levels of compression for regular and LOB data. The effects of compression can be seen in the
following ways:
•• Reduction in storage costs for online data;
•• Savings in I/O cost when caching compressed data;
•• Reduction in memory consumption when caching compressed data.
Data Compression/In-row LOB compression
Data compression uses less storage space for the same amount of data, reduces cache memory consumption, and helps
improve performance.
In-row LOBs can be maintained for a pre-defined size with automatic transfer to off-row, once the size is exceeded. LOB
compression follows the same techniques as backup compression (FastLZ – has lower CPU usage and execution times;
ZLib – has higher compression ratios). LOB datatypes supported are Text, Image, Unitext, off-row Java Object and XML.
SAP ASE is quite flexible in that the tables can have a mix of compressed and uncompressed data. This is true both from
a columnar perspective, since volatile columns could be uncompressed while the others are compressed, as well as for
table partitions that support different levels of compression. Compression can be turned on/off dynamically but even
if the option is turned off, compressed rows can still be accessed. Since enabling or disabling compression only affects
future data modifications, SAP ASE uses the reorg rebuild command to perform compression on pre-existing data. Since
reorg rebuild includes an online option, the compression can be enabled or disabled without interrupting user access. In
addition, page level compression is performed by the housekeeper process, which reduces the overhead and time it takes
to perform compression from the individual application users.
Backup Compression — Compressing a Dump
One of the responsibilities of a DBA is to ensure that the database is backed up and the latest backup files are available
in case there is a need to recover from a crash or a similar situation. SAP’s backup server is used for this purpose. As
the size of data and log grows, the size of the backup file would also significantly increase. Since most of these are
mission critical data, DBA’s usually take a database backup on a daily basis and a transaction backup on an hourly basis
which may vary depending on their needs. This translates into large amounts of storage space for maintaining these
backup files. SAP’s compression technology has been extended to the backup server as well where the databases and
transaction log can be compressed thereby reducing the space requirements for the archived or backed up database. The
recommendation is that the DBA chooses from a list of compression levels based on performance requirements.
9
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
Deferred Table Creation
When designing a database, depending on the type of data that has to be stored, a few tables or several hundreds or
thousands of tables can be created in the database. Every time a table is created, SAP ASE will reserve an extent for
each of the data and index partitions, resulting in lot of space being allocated and used. However, depending on how
the application uses the database, most of the times only a small number of tables actually get used. This results in a
substantial waste of space. With deferred table creation, SAP ASE allows the page allocation for a table to be deferred
until the table is actually needed. Deferred tables are helpful for applications that create many tables for their installation,
but only use a small number of them. Tables are called “deferred” until SAP ASE allocates their pages. Deferred tables
have entries in system tables, which allow the objects associated with deferred tables, such as views, procedures, and
triggers, and so on to be created. SAP ASE performs the page allocation for deferred tables when it inserts the first row
(called table materialization). Access to the table before the first insert by functions such as selects, deletes or updates
that report space usage, or enforce referential integrity constraints during DML on other tables, behave as if the table
were empty.
SCALE UP
Several enhancements have been made in SAP ASE 16.0 to improve scalability and associated performance. Some of the
key areas include run-time logging, lock management as well as metadata and latch management.
Metadata and Latch Management Enhancements
It has been observed that concurrent activities on a SAP ASE page with extreme transactions result in contention,
typically manifesting as data cache spinlock contention on a single cache partition. This is due to lots and lots of latch
requests on the data page that is at the center of all the activities. Also, high workload scenarios are prime candidates for
contention on procedure cache.
SAP ASE resolves these issues by narrowing down on the structural aspect of the internals of SAP ASE. In the case of
data cache spinlock contention, SAP ASE utilizes the existing stored procedure “sp_chgattribute” with the “ exp_row_size” to
help manage space on the server and allow the user to set values for “exp_row_size”. When the number of rows per page
is decreased, for example to the size of the logical page size, contention reduces significantly. As far as the contention on
procedure cache, SAP ASE has started setting aside local memory for each engine, eliminating contention as the access
is from local memory. Another enhancement that was done to reduce contention on procedure cache is the enabling of
engine local cache by default and the increasing of the size of procedure cache allocated to engine local cache that is
basically increasing the size of engine local cache.
Lock Management Enhancements
Lock contention in SAP ASE is usually seen when there is significant spinlocks in the system.
The scaling up in terms of hardware and user activities attributed to larger data sizes used to encounter roadblocks
due to this lock contention. SAP ASE has introduced several enhancements that help in reducing lock contention. As
explained earlier, increasing the procedure cache size allocated to engine local cache helped in increasing the number of
locks an engine can cache locally. Another enhancement done was the increasing of transfer size of locks from global to
local caches that translated into the ability to send blocks of locks instead of individual locks. This helped in improving the
lock transfers between local and global caches. These changes also reduced the need to reclaim locks from local to global
caches less frequently. One very good enhancement to alleviate high table lock contention in SAP ASE allows data-only
locked tables to be placed in the ‘hot table’ category. SAP has implemented all these enhancements so that no changes to
the configuration or user intervention are required.
10
OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE
Run-time Logging Enhancements
In an active OLTP system, tables are being updated, data is being inserted or deleted and performance is of prime
importance. With all these happening concurrently and when log records are transferred to syslogs, SAP ASE acquires
a lock on the log, which can end up being a point of contention. This translates into performance issues during run-time
logging. To avoid this kind of scenario, SAP ASE uses a user log cache for each transaction. There is a user log cache
queue being created which is partitioned into smaller blocks, each the size of the server’s logical page size. SAP ASE
moves the log records in each block to a global queue from where it is transferred to the syslogs. The advantage of doing
this is that SAP ASE sends batches of log records to syslogs instead of individual records, preferably when the transaction
is complete. This provides reduced contention and improved performance of run-time logging.
SUMMARY
SAP ASE is designed to address the challenges of terabyte-scale environments and continues to address challenges
around big data. In the database world, increased data volume translates into various issues like operational scalability,
decreased performance and crunch in storage. The features and the enhancements that have been introduced in
SAP ASE help in addressing the challenges around performance and scalability, storage and diagnostics, successfully
enhancing the VLDB experience for customers.
11
WWW.SAP.COM
© 2014 SAP AG or an SAP affiliate company. All rights reserved.
No part of this publication may be reproduced or transmitted in any form or
for any purpose without the express permission of SAP AG.
The information contained herein may be changed without prior notice.
Some software products marketed by SAP AG and its distributors contain
proprietary software components of other software vendors. National
product specifications may vary.
These materials are provided by SAP AG and its affiliated companies (“SAP
Group”) for informational purposes only, without representation or warranty
of any kind, and SAP Group shall not be liable for errors or omissions with
respect to the materials. The only warranties for SAP Group products and
services are those that are set forth in the express warranty statements
accompanying such products and services, if any. Nothing herein should be
construed as constituting an additional warranty.
SAP and other SAP products and services mentioned herein as well as their
respective logos are trademarks or registered trademarks of SAP AG in
Germany and other countries.
Please see http://www.sap.com/corporate-en/legal/copyright/index.
epx#trademark for additional trademark information and notices.