Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Optimizing Large Data Handling in SAP® ASE for Better Performance OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE TABLE OF CONTENTS Overview..................................................................................................................................3 SAP Adaptive Server Enterprise and VLDB..........................................................................3 Overhauling SAP ASE to Prepare for VLDB.........................................................................3 Optimizer Rewrite...............................................................................................................3 Data Storage & Devices.................................................................................................... 4 Multiple Tempdbs & RLC.................................................................................................. 4 Data & Index Partitioning...................................................................................................5 Partitions & Database Maintenance................................................................................ 6 Partitions & Query Parallelism......................................................................................... 6 Introduce New Techniques to Improve Performance..........................................................7 In-memory database..........................................................................................................7 Improved Join Methods – Enhance Star Joins............................................................... 8 Better Handling of LOB Data Providing Improved Performance................................... 8 Online Index Creation........................................................................................................ 9 Enhanced Insert-select Performance.............................................................................. 9 Storage................................................................................................................................... 9 Data Compression/In-row LOB Compression................................................................ 9 Backup Compression – Compressing a Dump............................................................... 9 Deferred table creation....................................................................................................10 Scale Up................................................................................................................................10 Metadata and Latch Management Enhancements.......................................................10 Lock Management Enhancements.................................................................................10 Run-time Logging Enhancements...................................................................................11 Summary................................................................................................................................11 2 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE OVERVIEW Data is becoming the life-line of any organization that would like to maintain its competitive advantage in the industry. In today’s fast-paced and competitive world, insights derived from data enables organizations to make the right decisions at the right time, optimize their operations, and provide better products and services to customers. Traditional database growth has been attributed to various factors, including indexes that are built in the database, backups that are taken and new compliance requirements, such as Sarbanes Oxley and HIPAA, that warrant the need for retaining historical data. With large amounts of data constantly flowing into the organization, IT departments have to ensure that there is sufficient storage and processing power, while DBAs have to make sure the database is well designed and tuned to allow queries to run fast, without encountering storage related bottlenecks or other issues. These organizations are more and more depending on the database provider for optimizing the database and getting better performance within a stable database environment. SAP ADAPTIVE SERVER ENTERPRISE AND VLDB SAP Adaptive Server® Enterprise (ASE) offers a versatile high performance enterprise-class relational database management system that is well suited for mission-critical, data-intensive environments. It is designed to handle the most demanding OLTP (online transaction processing) environments and has refined capabilities, among non-business warehouse products, to address decision support and analytical workloads. When considering the characteristics of Very Large Database (VLDB), the following often stand out as key requirements, in addition to the simplistic “large” storage one: •• High user concurrency/high transaction concurrency; •• Highly or continuous availability; •• User concurrency even during bulk operations, such as bulk feeds or archival processing; •• Large/Very Large SMP servers with massive amounts of memory and CPU core counts; •• Complex query optimization and parallel query features for operational reporting. OVERHAULING SAP ASE TO PREPARE FOR VLDB Although SAP ASE had done very well in the performance and management area with its support for row-level locking, dynamic memory allocation and distributed joins, It showed some limitations in the way it handled very large databases. To address these concerns, SAP has progressively introduced new functionalities changing the way SAP ASE handles large databases. Starting with SAP ASE 15, a complete rewrite of the optimizer along with the introduction of partitioning and improvements in parallelism helped alleviate some of the performance problems that were encountered when executing complex queries on very large data sets. Optimizer Rewrite SAP ASE 15saw a major improvement as the optimizer went through a complete overhaul. The old pre-15 optimizer was focused on OLTP and had limited functions for analytics oriented query optimization techniques and no effective means to control the optimizer beyond simplistic index and join order forcing. As a result, the SAP ASE 15 optimizer added a number of new features, including: •• Configurable optimizer controls and optimization goals that could be set on a server-wide, session, or query basis; •• Query level detailed query execution plan syntax that controlled which optimization and execution strategies would be used — beyond the simplistic index and join order forcing; •• Better and more effective support for parallel threads, eliminating the thread “explosion” found in previous releases, increasing the number of concurrent users who could benefit from parallel query; •• Improved query optimization diagnostics and statistics; •• Additional join and group by processing techniques which were more adapted to reporting type applications •• In-memory sorting techniques to improve overall query response times. •• Improved costing algorithms for providing better index selection or join orders and strategies; 3 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE The most noticeable change was delivered by the optimizer controls and optimization goals. SAP ASE now has optimization goals, which provide hints to the optimizer to choose the right query plan for executing the query. Here are the optimization goals, which can be set at a server, session or query level: •• allrows_mix – the default goal, as well as the most useful goal in a mixed-query environment - balances the needs of OLTP and DSS query environments; •• allrows_oltp – the most useful goal for purely OLTP queries; •• allrows_dss – the most useful goal for operational DSS queries of medium-to-high complexity. These optimization goals were at a higher level of abstraction. At a lower level, SAP ASE provided optimization controls that allowed users to selectively disable different join, grouping, distinct processing, sort strategies or other specific query execution methods, as desired. Alternatively, they could be enabled for an optimization goal, which normally wouldn’t use such techniques, As an example, a DSS style hash-join could be enabled for an application using an allrows_oltp optimization goal. These controls could be enabled for a session or for an individual query. Along with these improvements, SAP ASE optimizer provided cost-based pruning mechanisms to use estimated costs of sub-plans and avoid analyzing expensive sub-plans. Throughout the remainder of the SAP ASE 15.x platform lifecycle, the optimizer was continually improved to the point that shortly after announcing support for SAP applications, SAP ASE quickly began claiming benchmark records with the standard SAP SD Benchmark. SAP ASE was outperforming competitors’ solutions, despite their more than 20 years of working with SAP applications and their numerous optimizer hints, implemented specifically to support an ERP application. Data Storage & Devices As the data volume increases, the database size increases. This translates to a larger number of devices that needs to be used, since their number is dependent on the size of the devices that can be configured. Prior to SAP ASE 15, SAP ASE could only support up to 256 devices of 32GB each. In addition, each database was limited to two billion pages, resulting in a theoretical database size limit of 32TB. In reality, however, because of the 256 devices of 32GB limit, the real database size was restricted to 8TB. With many VLDBs in the tens of TB, the 8TB limit posed a challenge. The first change in SAP ASE 15 was to increase the number of devices from 255 to 2 billion (2,147,483,648) and the maximum size of a database device from 32GB to 4TB. This would allow a single database to hit the theoretical limit of 32TB based on two billion pages of 16KB. The second change came when unsigned integer types were added and the page id was treated as an unsigned integer. That meant a single database could contain four billion pages, for a total of 64TB and the server could support over one Exabyte of data, among multiple databases. These changes were a good first step, but a number of major real issues still needed to be tackled to effectively handle large data volumes. These major issues included the ability to perform maintenance on such large systems and controlling parallelism. Multiple Tempdbs & RLC Temporary databases in SAP ASE have long been used for a number of reasons. First, SAP ASE used tempdb for worktables related to query processing for executing complex queries or when sorting was necessary. Secondly, developers often used tempdb to store interim result sets when breaking up complex large processing steps into small discrete steps that could more easily be implemented. Initially, SAP ASE only supported a single tempdb. This quickly became a bottleneck under high concurrency, simply due to the log semaphore contention. It also caused a single point of failure in large OLTP VLDB systems. If a user working with a large table generated an interim result, it could fill the available tempdb space, bringing all processing to an immediate halt. As a result, prior to SAP ASE 15, SAP ASE implemented multiple tempdbs, supporting a single system tempdb and multiple user defined tempdbs. When using multiple tempdbs, SAP ASE allowed DBAs to define tempdb ‘groups’ or subsets of multiple tempdbs. Individual users or applications could then be bound to the different tempdb groups. Each session would be round-robin assigned to one of the tempdbs that was a member of that group. In doing so, each application could be isolated from the impacts of other applications and also different users could be isolated from the impacts of others. If a single tempdb was filled, only those users currently assigned to that tempdb would be 4 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE affected. Of course, SAP ASE also introduced a tempdb space resource governor that allowed DBAs to put limits on how much tempdb space a single user could consume. When the limit was reached the DBA could either kill that query or warn the user about the limits before the query would affect other users. While useful, the implementation of multiple tempdbs was still not enough to completely address the needs of VLDBs and further enhancements were implemented. Prior to SAP ASE 15, the tables sometimes used page locks even if the server’s default locking scheme was set to row-level locking. Page level catalog locking caused a huge problem for all types of applications, but especially for OLTP VLDB’s. Most OLTP VLDB’s had large user populations and were used in a mixed workload fashion, supporting both for pure online transactions as well as operational reporting. The mix of high concurrency and complex queries resulted in a lot of lock contention in tempdb. The use of multiple tempdbs helped, as the contention could be spread among multiple instances, however, in some VLDB scenarios, the remaining contention was such that deployment sites were using tens of tempdbs in an attempt to reduce the catalog contention. In addition, when a procedure needed to be re-resolved in systems with a lot of stored procedures or views, SAP ASE had to update system tables in the user database with new procedure tree information. These tables became a source of conflict under high concurrent usage, which is typical of OLTP VLDBs. To address these concerns, in SAP ASE 15, the SAP ASE system tables moved from page level locks to row level locks for all the system catalog tables. Row Locked Catalogs (RLC), as this was called, resolved not only the tempdb contention, but also the contention on sysprocedures in user databases. In addition, it aided in some database maintenance activities as concurrent update statistics or other similar commands used to block on systabstats/sysstatistics causing a real problem for VLDB as DBAs used to have to run admin commands one table at a time on the larger tables. Data & Index Partitioning One of the nice features that were introduced to SAP ASE in order to handle very large databases is the table partitioning. As the table size grows, it becomes more and more cumbersome to query and manage the table. Partitions in SAP ASE help by allowing the DBA to divide the tables and indexes into smaller and more manageable chunks. This allows for faster and easier data access and efficient manageability in terms of maintenance activities. SAP ASE supports horizontal partitioning where a specific set of table rows can be distributed to different partitions, which can practically be on different disk devices. The following are the partitioning strategies offered in SAP ASE: •• Hash partitioning (semantic) – a system-supplied hash function determines the partition assignment for each row; •• List partitioning (semantic) – values in key columns are compared with sets of user-supplied values specific to each partition. Exact matches determine the partition assignment; •• Range partitioning (semantic) – values in key columns are compared with a user-supplied set of upper and lower bounds associated with each partition. Key column values falling within the stated bounds determine the partition assignment; •• Round-robin partitioning – rows are assigned randomly to partitions in a round-robin manner so that each partition contains a more or less equal number of rows. This is the default strategy. Prior to SAP ASE 15, SAP ASE only supported round robin partitioning. Initially, this was done to alleviate contention on high insert heap tables (or tables with clustered indexes using monotonic sequences) due to contention experienced on the last page by multiple concurrent sessions. While this helped high speed OLTP systems, it proved to be a pain point for VLDB operations in several ways. First, it was not possible to control the partition where bulk feeds were being loaded into, and archives could not reliably predict in which partition order data resided. In addition, since the data could be in any partition, parallel query methods inefficiently ended up spawning MxN threads when performing joins in parallel. SAP ASE 15 added semantic partitioning in which data placement was based on selected column values instead of the session. Using semantic partitioning on tables proved to be beneficial for reducing maintenance time and improving parallel query efficiency. 5 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE Partitions & Database Maintenance It is an unfortunate reality that database tables require both logical and physical maintenance. From a logical level, older rows may need to be archived or daily bulk feeds need to be loaded. From a physical perspective, the tables need to have index statistics maintained and possibly reorganized, due to table fragmentation or other attributes. With VLDB systems, maintenance on a single large glob of a table is grossly inefficient. Most of the data, probably 90% of it or more, is not changing because of either logical (archive/bulk feeds) or physical (update stats/reorg) maintenance. Yet some of the physical operations would move that static 90% of data around, or scan it for statistics updates, while archive processes would perform huge deletes or similar operations on it. By partitioning a table, the maintenance could be made much easily. First, physical maintenance commands such as reorg or update statistics could be run on a single partition, reducing the time and impact by multiple orders of magnitude. As an example, a single update statistics command that would have run for eight hours on an un-partitioned table could now run in less than 30 seconds, on just the partitions that required it. Second, logical operations such as archive could leverage truncate partition as opposed to using huge deletes that would have taken much longer. Additionally, bulk feeds could be targeted more precisely at the correct partitions that were impossible to achieve with the round-robin partitioning of pre-15 servers. Archive processes also could target the desired partitions more selectively. This made large tables more manageable from both a pure database administration standpoint and the business application perspective. With the addition of split, merge and move partition capabilities, SAP ASE provided useful tools for improving physical maintenance. For example, it might be useful to have monthly based partitions for current and previous year’s data. However, if the data retention requirements demand 10 years of data to be preserved, you quickly end up with 100’s of partitions, which while working fine may lead to some administrative headaches and query processing overhead. Using merge partition, partitions in the three to five year old range could be merged into quarterly partitions while data partitions older than 5 years could be merged into yearly partitions. In addition, as the data ages, it could be moved to lower cost storage, as it might not be accessed as often and the performance of lower cost storage may be adequate for those requirements. Partitioning was further improved in SAP ASE 16 by adding partition locking. Traditionally SAP ASE has supported table level, page level and row level locks. When there were large modifications within a single partition that exceeded the page or row lock promotion threshold, SAP ASE often acquired a table lock, which essentially restricts concurrent DML activities on the entire table, no matter which partition. This was problematic for large inserts from daily bulk feeds would block users from looking at yesterday’s or previous data. Large deletes as part of archive operations would have a similar impact. With data flowing in from all directions and in order to make intelligent analysis of this data, there is a need for quicker and continuous access to data. SAP ASE has introduced partition level locks, which enable lock promotion at a partition level and allows DDL/DML access to other partitions when performing concurrent activities on a partition in the same table. Partitions & Query Parallelism As mentioned previously, with the pre-15 partitioning, which was based strictly on round-robin partitions, when performing parallel queries involving joins, there was a veritable explosion of worker processes as each worker process on the outer table had to spawn additional worker processes for each of the inner table join materialization probes. This was the dreaded M-x-N thread explosion problem. For example, if a table of four partitions was joined with a table of 16 partitions, the resulting query would often use 48 worker processes. Each of the four worker processes in the outer table would spawn 16 additional worker threads to scan each of the 16 inner table partitions. This was necessary as without semantic partitioning, the joining data could be in any one of the partitions. Because only a limited number of CPUcores is available at a given moment and there are limits to the number of worker processes a single core can support, a single parallel query — or up to three at most — often wiped out all available worker processes and totally consumed the machine. This would negatively impact other users, even on some of the largest SMP boxes available. Think of it, just three queries at ~48 threads, each is pushing 150 worker threads, would likely swamp a 48 core host completely. 6 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE With semantic partitioning, a vector parallelism approach could be used, quickly dropping the number of worker processes from M-x-N to M+N, at the worst case. Let’s take a table with four partitions again joining a table of 16 partitions, both partitioned on the same key. SAP ASE 15 would start off with four worker threads for the outer table as before. However, due to partition elimination, each of the four worker threads would only need to spawn one additional worker thread per qualifying partition, based on the partition key. The net result would be the use of a single worker process per inner table partition as opposed to multiple. Additionally, only a total of 20 worker threads would be used instead of the 48 used in our previous example. However, this number could be reduced even further. Remember, partition elimination could also apply to the outer table in the query as well. As a result, the outer table might only use one or two worker threads more commonly, for a total worker thread consumption of four to eight per query. Now instead of allowing two to three users to exhaust the 150 worker thread pool, now it would take 20-30 queries to achieve the same level of resource utilization. This means more users could benefit from query parallelism since the users who are using parallel query features would have considerably less impact on those who are not. One additional area of improvement was identified in SAP ASE 15’s use of a static thread assignment model to execute parallel query plans. This meant that the number of worker threads to be used for a query would be determined at initial runtime, based on the available threads at that point. If the worker threads were all in use, a query might start executing serially. If another query finished and freed up some worker threads, the query would continue to run with the statically assigned execution plan. In order to cater to the VLDB and large data volumes, SAP ASE now uses a dynamic thread assignment execution model for parallel query plans. This methodology provides improvement in performance while using a smaller number of worker threads to execute the query plans in parallel. This could benefit VLDBs in several ways. First, from our earlier example, once a query finished, our query could start running in parallel. Secondly, from our even earlier example of a four-way partitioned table and a 16-way partitioned table, each of the outer table worker processes really would only be scanning a single inner table partition at a time. With dynamic thread assignment, this would mean that each of the inner table threads would only be assigned for each partition as necessary. This means that rather than M+N (20 threads), our new maximum would be M*2 or 8 threads. Additionally, with partition elimination resulting in one to two outer partitions, we will have a maximum of two to four threads. Another area of enhancement that is bound to provide performance improvement is the concept of dynamic load balancing between worker threads, during the execution of the parallel query plan. The query plan is subdivided into smaller chunks and worker threads are assigned to execute these chunks of the query plan. Those worker threads that complete the smaller chunks of the query plan quickly return back to execute the remaining part of the query plan chunks to complete the execution. INTRODUCE NEW TECHNIQUES TO IMPROVE PERFORMANCE In-memory Database It is essential for any database to adhere to the ACID properties (Atomic/Consistent/Isolated/Durable) which allows consistency and ensures easy restorability. However, there are some applications and data that are not very particular on ensuring that all modifications in a transaction are rolled back (atomicity) and likewise all committed transactions are persisted (durability). Such applications and data can make use of SAP ASE’s in-memory database offering, which relaxes its hold on atomicity and durability but that in exchange provides great performance. SAP ASE provided two forms of in-memory offering: •• IMDB – In-Memory database; •• RDDB – Reduced durability database. In the IMDB offering, SAP ASE supports minimally logged DML. Since all the data is in memory and there is no disk storage involved, when there is a hardware crash, all data is lost. SAP ASE does provide an option of creating a template of the database in memory that can be used to recreate the database in a hardware crash situation. 7 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE In the RDDB offering, everything is fully disk-based, although many of the optimizations that were done for IMDB have been implemented here, giving a different twist to the entire situation. Relaxed durability database exchanges the full durability of committed transactions for enhanced runtime performance, especially for databases that exceed the memory capacity or when an in-memory database would be not cost effective. There are two areas where IMDB and RDDB aided VLDB implementations. The first was IMDB tempdbs, which were used because VLDB environment typically serve a mix of applications with varying workload and impacts on tempdb. Pure OLTP applications need extremely fast and short/small tempdb characteristics. Reporting systems tend to work with larger volumes of data and, as a result, need larger tempdbs. For example, one way in which a lot of VLDBs “grew up” is that the extremely high transaction rates, caused by electronic feeds coupled with data retention, resulted in what we sometimes might think of OLTP VLDBs. These VLDBs have an extreme OLTP primary purpose, but are also used for operational reports. In these situations using a single style of tempdb often requires DBAs to consider a compromise and choose between putting tempdb on a cheaper storage to reduce the cost of the larger reporting requirements, or increase costs by placing tempdb on a more expensive storage. While the multiple tempdbs helped mitigate this compromise, since in many cases the speed of OLTP transactions were in the eXtreme OLTP (XOLTP) range, attempts to use RAMDisk/tmpfs and other OS features were commonplace. With the IMDB capability, DBAs could instead create a tempdb completely in memory and assign it to the XOLTP applications. This actually reduced costs, as the more typical RAMDisk/tmpfs implementations required double buffering in memory, once in the DBMS cache and once in the OS. It also improved speed, as SAP ASE could operate on it as fully in-memory instead of having to attempt physical I/O to a file system, whether the data was in memory or not, with the associated file system restrictions. The second use case involves RDDB. Often, VLDB systems also have daily bulk feeds that occur, especially from business partners. These bulk feeds need to be loaded into the database and have business validation rules or data scrubbing activities run. Then the new data may need to be merged with existing data sets. Rather than performing this load in the primary business database and cause the extra logging overhead of the data loads as well as scrubbing actions, often one or more auxiliary databases in the same server would be used as a staging area for the data, until it became ready to be loaded into the primary database. For example, a hospital might submit a daily batch of billing records to a health insurance claims processing system. Since the feed is likely in a fairly raw format, there likely would be a series of data scrubs or transformations that would be necessary to get the data into a format that could easily be joined with the existing data. The claims would have to be compared to existing members as well as validated for ensuring appropriate rules are followed, such as pre-authorization. In such systems, often the process uses one or more recovery points. If a failure or error occurs, the data is simply set back to a recovery point and the processing restarts from there. Could an IMDB be used for the example above? Possibly — but since that process is only invoked once per day, the memory set aside for the IMDB would be wasted during the remainder of the day. In addition, the feed could be quite large, exceeding the amount of free memory that could be used to make an IMDB. Instead, this auxiliary database could use the RDDB minimal logging to improve the performance of data loads and scrubbing transactions, without the concern of the recoverability after processing finishes. Improved Join Methods — Enhance Star Joins SAP ASE has recently added plan optimization feature called the ‘bloom filter’, which is used to improve join performance. A bloom filter provides early filtering of non-joinable rows before they would reach the join operator. This enhancement is expected to reduce the total cost of the join. The bloom filter is implemented for hash join, sort merge join and reformatting based nested loop join. Better Handling of LOB Data Providing Improved Performance In an application environment, LOB data is handled by the application by executing SQLBulkOperations and by using ODBC batching protocols like SQLExecute/SQLExecDirect. Since data-at-exec is not supported, LOB data must be fully materialized before executing these protocols. A new API called ‘data-at-exec’ is being provided which will help the application to avoid materializing the LOB parameters prior to the call to SQLExecute. Instead LOB data will be sent in chunks and the ODBC driver will convert the data as it is received and hold on to it until all data for the parameter is received, at which time it will be sent to the server. 8 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE Online Index Creation It is becoming more and more important for businesses to have high data availability. Traditionally, creating an index on a large data has been known to cause a significant performance problem in terms of data availability. SAP ASE includes the create index... online parameter providing the ability to create indexes without blocking access to the data being indexed. Except for the sorted_data parameter, SAP ASE processes other create index parameters the same way, both with or without the online parameter. Enhanced Insert-select Performance This feature was developed to improve insert select performance in SAP ASE by using bulk mode. It helps address massive data insert performance for the data-warehousing data loading phase. In order to provide full control to the user, SAP ASE introduces new optimizer criteria INS_BY_BULK via the existing User Defined Optimizer Goal framework. The new optimizer criteria let the user to enable and disable this feature based on their need. The feature can be activated or deactivated at multiple levels – server level, session level, procedure level and query level. This approach gives maximum flexibility and granular control to choose the appropriate level that fits the application’s needs and does not compromise the performance of the inserts, where parallelism does not yield good performance. STORAGE SAP SAP ASE provides compression functionality that optimizes disk space to better handle the deluge of data coming in to the database. SAP ASE now supports data compression, LOB compression and backup compression. SAP ASE provides different levels of compression for regular and LOB data. The effects of compression can be seen in the following ways: •• Reduction in storage costs for online data; •• Savings in I/O cost when caching compressed data; •• Reduction in memory consumption when caching compressed data. Data Compression/In-row LOB compression Data compression uses less storage space for the same amount of data, reduces cache memory consumption, and helps improve performance. In-row LOBs can be maintained for a pre-defined size with automatic transfer to off-row, once the size is exceeded. LOB compression follows the same techniques as backup compression (FastLZ – has lower CPU usage and execution times; ZLib – has higher compression ratios). LOB datatypes supported are Text, Image, Unitext, off-row Java Object and XML. SAP ASE is quite flexible in that the tables can have a mix of compressed and uncompressed data. This is true both from a columnar perspective, since volatile columns could be uncompressed while the others are compressed, as well as for table partitions that support different levels of compression. Compression can be turned on/off dynamically but even if the option is turned off, compressed rows can still be accessed. Since enabling or disabling compression only affects future data modifications, SAP ASE uses the reorg rebuild command to perform compression on pre-existing data. Since reorg rebuild includes an online option, the compression can be enabled or disabled without interrupting user access. In addition, page level compression is performed by the housekeeper process, which reduces the overhead and time it takes to perform compression from the individual application users. Backup Compression — Compressing a Dump One of the responsibilities of a DBA is to ensure that the database is backed up and the latest backup files are available in case there is a need to recover from a crash or a similar situation. SAP’s backup server is used for this purpose. As the size of data and log grows, the size of the backup file would also significantly increase. Since most of these are mission critical data, DBA’s usually take a database backup on a daily basis and a transaction backup on an hourly basis which may vary depending on their needs. This translates into large amounts of storage space for maintaining these backup files. SAP’s compression technology has been extended to the backup server as well where the databases and transaction log can be compressed thereby reducing the space requirements for the archived or backed up database. The recommendation is that the DBA chooses from a list of compression levels based on performance requirements. 9 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE Deferred Table Creation When designing a database, depending on the type of data that has to be stored, a few tables or several hundreds or thousands of tables can be created in the database. Every time a table is created, SAP ASE will reserve an extent for each of the data and index partitions, resulting in lot of space being allocated and used. However, depending on how the application uses the database, most of the times only a small number of tables actually get used. This results in a substantial waste of space. With deferred table creation, SAP ASE allows the page allocation for a table to be deferred until the table is actually needed. Deferred tables are helpful for applications that create many tables for their installation, but only use a small number of them. Tables are called “deferred” until SAP ASE allocates their pages. Deferred tables have entries in system tables, which allow the objects associated with deferred tables, such as views, procedures, and triggers, and so on to be created. SAP ASE performs the page allocation for deferred tables when it inserts the first row (called table materialization). Access to the table before the first insert by functions such as selects, deletes or updates that report space usage, or enforce referential integrity constraints during DML on other tables, behave as if the table were empty. SCALE UP Several enhancements have been made in SAP ASE 16.0 to improve scalability and associated performance. Some of the key areas include run-time logging, lock management as well as metadata and latch management. Metadata and Latch Management Enhancements It has been observed that concurrent activities on a SAP ASE page with extreme transactions result in contention, typically manifesting as data cache spinlock contention on a single cache partition. This is due to lots and lots of latch requests on the data page that is at the center of all the activities. Also, high workload scenarios are prime candidates for contention on procedure cache. SAP ASE resolves these issues by narrowing down on the structural aspect of the internals of SAP ASE. In the case of data cache spinlock contention, SAP ASE utilizes the existing stored procedure “sp_chgattribute” with the “ exp_row_size” to help manage space on the server and allow the user to set values for “exp_row_size”. When the number of rows per page is decreased, for example to the size of the logical page size, contention reduces significantly. As far as the contention on procedure cache, SAP ASE has started setting aside local memory for each engine, eliminating contention as the access is from local memory. Another enhancement that was done to reduce contention on procedure cache is the enabling of engine local cache by default and the increasing of the size of procedure cache allocated to engine local cache that is basically increasing the size of engine local cache. Lock Management Enhancements Lock contention in SAP ASE is usually seen when there is significant spinlocks in the system. The scaling up in terms of hardware and user activities attributed to larger data sizes used to encounter roadblocks due to this lock contention. SAP ASE has introduced several enhancements that help in reducing lock contention. As explained earlier, increasing the procedure cache size allocated to engine local cache helped in increasing the number of locks an engine can cache locally. Another enhancement done was the increasing of transfer size of locks from global to local caches that translated into the ability to send blocks of locks instead of individual locks. This helped in improving the lock transfers between local and global caches. These changes also reduced the need to reclaim locks from local to global caches less frequently. One very good enhancement to alleviate high table lock contention in SAP ASE allows data-only locked tables to be placed in the ‘hot table’ category. SAP has implemented all these enhancements so that no changes to the configuration or user intervention are required. 10 OPTIMIZING LARGE DATA HANDLING IN SAP ASE FOR BETTER PERFORMANCE Run-time Logging Enhancements In an active OLTP system, tables are being updated, data is being inserted or deleted and performance is of prime importance. With all these happening concurrently and when log records are transferred to syslogs, SAP ASE acquires a lock on the log, which can end up being a point of contention. This translates into performance issues during run-time logging. To avoid this kind of scenario, SAP ASE uses a user log cache for each transaction. There is a user log cache queue being created which is partitioned into smaller blocks, each the size of the server’s logical page size. SAP ASE moves the log records in each block to a global queue from where it is transferred to the syslogs. The advantage of doing this is that SAP ASE sends batches of log records to syslogs instead of individual records, preferably when the transaction is complete. This provides reduced contention and improved performance of run-time logging. SUMMARY SAP ASE is designed to address the challenges of terabyte-scale environments and continues to address challenges around big data. In the database world, increased data volume translates into various issues like operational scalability, decreased performance and crunch in storage. The features and the enhancements that have been introduced in SAP ASE help in addressing the challenges around performance and scalability, storage and diagnostics, successfully enhancing the VLDB experience for customers. 11 WWW.SAP.COM © 2014 SAP AG or an SAP affiliate company. All rights reserved. No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG. The information contained herein may be changed without prior notice. Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors. National product specifications may vary. These materials are provided by SAP AG and its affiliated companies (“SAP Group”) for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an additional warranty. SAP and other SAP products and services mentioned herein as well as their respective logos are trademarks or registered trademarks of SAP AG in Germany and other countries. Please see http://www.sap.com/corporate-en/legal/copyright/index. epx#trademark for additional trademark information and notices.