* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Techwave_2005_am18a_ASAInternals
Survey
Document related concepts
Serializability wikipedia , lookup
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Ingres (database) wikipedia , lookup
Versant Object Database wikipedia , lookup
ContactPoint wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Transcript
AM18 ASA INTERNALS: DATA MANAGEMENT GLENN PAULLEY, DEVELOPMENT MANAGER [email protected] AUGUST 2005 Goals of this presentation Overview of data management and query processing in Adaptive Server Anywhere 9.0.2 Concentrate on performance issues and problem areas Provide an overview of SQL Anywhere 9.0 technology Highlight planned features for the Jasper release Agenda Section One: SQL language support, data management Section Two: query execution and optimization 2 Design goals of SQL Anywhere Studio Ease of administration Good out-of-the-box performance “Embeddability” features self-tuning Cross-platform support Interoperability 3 Motivation for the ASA 9.0 release Exploit the new architecture of 8.0 and add support for additional language features, including GROUP BY ROLLUP RECURSIVE UNION Window functions and other OLAP support XML Table Functions INTERSECT and EXCEPT ORDER BY, SELECT TOP N in any query block, including views Improve performance 4 Highlights of the ASA 9.0 releases HTTP server ASA Index Consultant Improved performance, scalability better scalability in OLTP environments Query processing improvements optimization refinements – particularly with the server’s cost model histograms modified according to update DML statements alternate, efficient execution methods for complex queries SNMP support 9.0.1 EBF build 1828, Windows platforms only Formally part of the 9.0.2 release 5 Performance, performance, performance Version comparison, 10GB DB, Minutes 15.0 13.0 11.0 9.0 7.0 5.0 3.0 1.0 -1.0 Q01 Q02 7.0.4.2788 14.6 1.1 Q04 Q05 Q06 Q10 Q11 Q12 Q14 Q15 Q16 Q17 1068. 20.7 Q03 52.8 1.0 515.2 90.2 825.1 29.1 8.0.0.2065 7.7 1.0 8.1 9.0.0.1073 4.6 2.6 3.1 9.0.1.1751 4.2 0.7 10.0.1212 3.8 0.6 Q07 Q08 Q09 16.1 12.8 177.8 3.8 1.2 2.9 8.3 227.3 1500. 1500. 1500. 1500. 412.2 6.8 7.9 2.7 672.7 9.2 1.9 6.5 13.5 2.5 4.9 5.2 6.0 1500. 1500. 1500. 1500. 1500. 408.6 2.4 3.3 1.0 3.2 3.4 6.2 3.5 0.7 2.4 3.7 0.3 0.5 2.6 4.7 14.1 3.2 1.5 8.9 0.9 3.5 5.7 1.9 2.8 1.2 3.3 2.9 4.7 2.5 0.5 1.9 1.5 0.4 1.5 1.5 2.2 6.7 2.3 1.9 6.6 0.7 2.6 2.2 1.7 2.4 1.0 2.9 2.5 4.2 2.0 0.5 1.8 1.5 0.3 0.6 1.2 1.4 4.5 1.9 1.7 5.8 1.1 2.1 717.9 13.6 Q13 Q18 Q19 Q20 Q21 Q22 Avg 6 Contents Language Support New SQL constructs supported with 9.0.1 Data Management in 9.0.1 Database organization Table storage organization Index storage organization Physical database design tips Jasper features 7 New SQL language support in 9.0.1 Table functions (SELECT over a stored procedure) ORDER BY clause now supported in all SELECT blocks Necessary to support SELECT TOP n in derived tables, views, and subqueries with correct semantics RECURSIVE UNION (bill-of-materials) queries INTERSECT and EXCEPT query expressions LATERAL keyword for derived tables Now necessary for derived tables or table expressions containing outer references WITH clause (common table expressions) Essentially in-lined view definitions 8 New SQL language support in 9.0.1 SELECT TOP n START AT m Equivalent functionality to that in MySQL, Postgres n and m can be variables or host variables WITH INDEX hint in FROM clause Named CHECK, PK, FK, UNIQUE constraints Constraint violation message refers to the constraint name New catalog tables: SYSCONSTRAINT contains information about all constraints, even referential integrity constraints SYSCHECK contains the body of the CHECK constraint; now permit multiple CHECK constraints on the same column(s) Specific CHECK constraint that is violated appears in error Not available in older database formats, even if DBUPGRAD is used 9 New SQL language support in 9.0.1 OLAP support VARIANCE, STD_DEV aggregate functions ORDER BY clause for LIST aggregate function GROUP BY ROLLUP, CUBE, GROUPING SETS Binary set functions (linear regression, co-variance, etc.) Rank functions Windowed aggregate functions Construct “moving average” results in a single SQL statement Support for multiple DISTINCT aggregate functions in a single SELECT block Necessitates the use of Hash Group By 10 New SQL language support in 9.0.1 Support for SET statement in Transact-SQL dialect stored procedures Implemented for MS SQL Server compatibility EXECUTE IMMEDIATE extensions Procedures can now use EXECUTE IMMEDIATE to execute dynamically-constructed queries which return a result set WITH ESCAPES ON | OFF WITH QUOTES ON | OFF Variable assignment permitted in UPDATE statements (8.0.1) SELECT INTO base-table 11 New SQL language support in 9.0.1 FOR XML AUTO, FOR XML RAW, FOR XML EXPLICIT, OPENXML procedure (supports XPATH queries over XML column values) SQLX functionality: xmlelement(), xmlforest(), xmlgen(), xmlconcat(), and xmlagg() EXPRTYPE() function – outputs the type of the expression argument Useful when defining computed columns LOCATE() can handle negative offsets INSERT WITH AUTO NAME (8.0.2) 12 Table functions SELECT * FROM SYS.SYSTABLE as st, sa_table_fragmentation() as tbfrg WHERE st.table_name = tbfrg.tablename Result set description determined from the catalog; result set must match exactly Otherwise SQLSTATE ‘WP012’ Workaround: use the WITH clause to annotate the procedure reference in the FROM clause: SELECT * FROM PROC() WITH( X Integer, Y char(17) ) 13 Table functions Procedure may return only one result set Statistics regarding cost, result set cardinality of the procedure are captured at run time; used for subsequent requests Statistics are stored in SYS.SYSPROCEDURE Minimally requires DBUPGRAD of older databases to 9.0.0 14 Recursive UNION SQL-2003 implementation of recursive (bill-of-materials) queries Only DB2 also offers RECURSIVE UNION support; Oracle implements a ‘cycle’ clause Uses specialized join operators: recursive hash inner and outer joins will utilize a nested-loop strategy if inputs are small; done adaptively at run-time during query execution WITH RECURSIVE r (level, emp_id, manager_id) as ( SELECT 1, emp_id, manager_id FROM employee WHERE emp_id = manager_id UNION ALL SELECT level+1, e.emp_id, e.manager_id FROM employee e JOIN r ON (e.manager_id = r.emp_id) WHERE e.emp_id <> e.manager_id and level < 3) SELECT * FROM r 15 Recursive UNION: restrictions Query expression must be UNION ALL Recursive reference must be in a query block that does not contain DISTINCT, aggregation, or an ORDER BY clause Recursive reference in a LEFT OUTER JOIN is permitted Schema of WITH clause must match recursive query Implicit type conversions involving truncation can yield undesired results; SQLSTATE 42WA2 returned if server detects a type mismatch Use CAST to ensure compatible types Infinite queries are possible; server kills the query after N recursions controlled by the new connection option MAX_RECURSIVE_ITERATIONS (default 100) 16 INTERSECT and EXCEPT Implement set/bag difference and set/bag intersection Both ALL and DISTINCT variants are supported; DISTINCT performed by default Form query expressions in the same fashion as UNION NULL treated as a special value in each domain, hence NULLs are equivalent to each other Useful when formulating queries that require counting of identical rows See the help for order-of-precedence amongst the set operators 17 EXCEPT and INTERSECT ALL Rewrite to transform ALL to DISTINCT done automatically by the optimizer Both EXCEPT and INTERSECT can be computed through either a merge or hashing technique Also supports an (expensive) nested-loop strategy in case a cache shortage is encountered With ALL variants: implicitly performs aggregation to count the number of duplicate rows in each input A new query execution operator, ROW REPLICATE, generates the required copies of each row SELECT description FROM product EXCEPT ALL SELECT description FROM product as p2 WHERE quantity < 15 18 GROUP BY ROLLUP Computes aggregates as usual, but result set contains multiple sets of groups Logically, grouping is performed N+1 times for N grouping expressions Essentially implements the functionality of COBOL Report Writer in a single SQL request SELECT state, zip, count(*), grouping(zip), grouping(state) FROM customer GROUP BY ROLLUP (state, zip) 19 GROUP BY CUBE Computes aggregates as usual, but result set contains the power set of the N grouping expressions Expensive to execute for large N Result can be restricted through the specification of GROUPING SETS SELECT state, zip, count(*), grouping(zip), grouping(state) FROM customer GROUP BY CUBE (state, zip) SELECT state, zip, count(*), grouping(zip), grouping(state) FROM customer GROUP BY GROUPING SETS ( (state, zip), state, zip, () ) 20 WINDOW functions Part of SQL OLAP extensions Computes aggregates (except LIST) over a window of rows Provides an ANSI-compliant way to number the rows of a result set ROW_NUMBER() rather than NUMBER(*) Useful to: Compute cumulative aggregates, or “moving averages” Eliminate the need for correlated subqueries involving aggregation 21 WINDOW functions List employees, by department, in four US states by their start dates, along with their cumulative salaries: SELECT dept_id, emp_lname, start_date, salary, SUM(salary) OVER (PARTITION BY dept_id ORDER BY start_date RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS “Sum_Salary" FROM employee WHERE state IN ('CA', 'UT', 'NY', 'AZ') AND dept_id IN ('100', '200') ORDER BY dept_id, start_date; 22 WINDOW functions List all orders (with part information) where the part quantity cannot cover the maximum single order for that part: SELECT o.id, o.order_date, p.* FROM sales_order o, sales_order_items s, product p WHERE o.id = s.id and s.prod_id = p.id and p.quantity < (SELECT max(s2.quantity) FROM sales_order_items s2 WHERE s2.prod_id = p.id) ORDER BY p.id, o.id SELECT order_qty.id, o.order_date, p.*, max_q FROM ( SELECT s.id, s.prod_id, MAX(s.quantity) OVER (partition BY s.prod_id order by s.prod_id) AS max_q FROM sales_order_items s) as order_qty, product p, sales_order o WHERE p.id = prod_id and o.id = order_qty.id and p.quantity < max_q ORDER BY p.id, o.id 23 WINDOW functions Find the salespeople with the best sales (total amount) for each product, including ties: SELECT s.prod_id, o.sales_rep, SUM(s.quantity) as total_quantity, SUM(s.quantity * p.unit_price) as total_sales FROM sales_order o KEY JOIN sales_order_items s KEY JOIN product p GROUP BY s.prod_id, o.sales_rep HAVING total_sales = (SELECT FIRST SUM(s2.quantity * p2.unit_price) as sum_sales FROM sales_order o2 KEY JOIN sales_order_items s2 KEY JOIN product p2 WHERE s2.prod_id = s.prod_id GROUP BY o2.sales_rep ORDER BY sum_sales DESC ) ORDER BY s.prod_id SELECT v.prod_id, v.sales_rep, v.total_quantity, v.total_sales FROM ( SELECT o.sales_rep, s.prod_id, SUM(s.quantity) as total_quantity, SUM(s.quantity * p.unit_price) as total_sales, RANK() OVER (PARTITION BY s.prod_id ORDER BY SUM(s.quantity * p.unit_price) DESC) as sales_ranking FROM sales_order o KEY JOIN sales_order_items s KEY JOIN product p GROUP BY o.sales_rep, s.prod_id ) as v WHERE sales_ranking = 1 ORDER by v.prod_id 24 Data Management in 9.0.2 25 Moving to ASA 9.0.2 If database is 8.0.2, unload/reload to 9.0 is largely unnecessary DBUPGRAD to 9.0 required for some catalog schema changes, in particular for the Index Consultant There should be no consequences of using DBUPGRAD with respect to performance However: only 9.0 format databases support named constraints only 9.0 format databases support cache warming only 9.0.1 databases support page checksums 8.0.2 databases do not support index statistics collection by default Can be turned on when creating the database via CREATE DATABASE (but not dbinit) 26 Moving to ASA 9.0.2 Otherwise, unload/reload from 8.0.1 or 8.0.0 recommended Clustered index support Better statistics management Improved histogram organization, statistics collection Index statistics kept persistent in the database file Improved histograms Cache warming on startup Checksums on database pages PCTFREE option for base and temporary tables 27 Moving to SQL Anywhere “Jasper” The Jasper release of the SQL Anywhere server will not support older database formats Jasper will ship with a migration tool to convert an existing database into a Jasper-format database 28 Database organization A database consists of up to 13 “dbspaces” Maximum size of each dbspace is limited by the underlying operating system Maximum database size is also determined by page size Limit for any dbspace is 2**28 (256 million) pages Each dbspace, the temporary file, and the transaction log is a simple OS file Ease of administration, backup Temporary file is used for temporary tables A dbspace file grows in 256K extents (512K if 16K pages, 1Mb if 32K pages) Database files can be copied to/from different endian machines Can copy database from Wintel to big-endian UNIX systems and back again Server automatically does data conversion where necessary 29 Database organization A database file contains: table pages index pages free pages rollback log pages checkpoint log pages Each dbspace for a database must use the same page size 30 Physical organization: tables Each table uses an independent set of table pages Each table allocates at least one page, even if the table is empty Server maintains bit-maps for table pages Supports clustering of table pages in the same portion of the database file Facilitates large-block I/O – SQL Anywhere reads 64K at a time when doing sequential scans Result: considerably faster sequential scan performance 31 Physical organization: tables New in 8.0.2: ‘scattered read’ support on Windows 2000 and Windows XP Another mainframe technology being reinvented on PC/UNIX servers aka “locate-mode I/O” Improves performance, reduces memory requirements Coming to other platforms as vendors implement it Tables cannot span dbspaces Each secondary index on a table can be stored in a separate dbspace Recommended if multiple spindles are available (not necessary for RAID devices) Partition dbspaces on separate devices whenever possible Brings more disk arms to bear, reducing seek latency 32 Physical organization: tables Rows are inserted into pages at a point where, if at all possible, the entire row can be stored contiguously Caveat: row segments are at most 4K; second or subsequent row segments can appear on different pages Columns are packed tightly together; only unpadded values are stored on disk Primary key columns are always at the beginning of each row, in sequence Server may rewrite all rows if PK added or modified Rows can be of (almost) unlimited size; are split across pages where necessary Maximum length of any column is 2Gb Maximum number of rows per page is 255 33 Physical organization: tables Rows are not guaranteed to be placed in pages corresponding to their insertion order By default, ASA uses a first-fit algorithm for page selection To guarantee ordering of a result set, specify an ORDER BY clause Space is not reserved for columns that are null BLOB values are stored in a separate “arena” of pages First 255 bytes are stored together with the row Access to the rest of the BLOB value will almost certainly require a SEEK Implications for choice of page size Once inserted, a row identifier is immutable An updated row must be split if its new length does not allow it to fit on the page 34 Physical organization: tables Table pages are allocated in 8 page clusters; cluster allocation depends on page size 2K: grow 4 clusters at a time 4K: grow 2 clusters at a time All other page sizes: one cluster at a time ASA will re-use database pages for additional inserts if entire pages are freed Defaults: for 1K pages, free space is 100 bytes; all other page sizes is 200 bytes DBA can specify freespace percentage to accommodate future table UPDATEs using PCTFREE PCTFREE characteristic stored in new catalog table SYSATTRIBUTE (and corresponding table SYSATTRIBUTENAME) Can be specified for temporary tables 35 Page sizes Page sizes supported are 1K, 2K, 4K, 8K, 16K, 32K 2K page size minimum on all UNIX platforms Default changed to 2K in the 6.0.3 release A server can support several databases concurrently Buffer pool page size will be the largest database page size specified on the command line Consider tradeoffs with your choice of page size 4K recommended; occasionally 8K may offer improved performance Default will change to 4K with Jasper release Do not use 16K or 32K pages unless you have a specialty application In typical environments, large page sizes cause inefficient use of cache 36 Choice of page size does matter Larger rows usually require larger pages (requires fewer split rows) Random retrieval performance is dependent on the application Larger pages can pollute the cache with unnecessary data Often require larger buffer pools to accommodate the application’s working set Smaller pages are more cache efficient, but Smaller pages reduce index fanout, and can increase index depth 37 Choice of page size does matter Don’t ignore index maintenance costs when considering page size (larger page sizes can mean increased cache pressure) Test your application with different alternatives Your mileage may vary A 4K page size is a typical choice for many applications My recommendation: use 4K pages unless thorough testing proves that a different page size offers better performance/scalability See data storage whitepaper Available at www.ianywhere.com/developer Recently updated for 9.0.0 38 Physical organization: indexes ASA 9.0 supports two different types of indexes: Hash-based Key is a one-way order-preserving encoding of at most nine bytes of the data values Hash-based indexes are still used when the key length does not satisfy the limits for compressed indexes Compressed Contains Patricia tries in the index’s internal nodes Used for keys > 10 bytes and less than 122 bytes with 1K pages 248 bytes for all other page sizes Substantially improved performance with larger keys 39 Physical index organization: hashbased indexes Values in an index are “hashed” into a key of at most 10 bytes using an order-preserving encoding function WITH HASH SIZE is deprecated Each indexed column encoded separately, with a one-byte length A 10-byte hash value can hold two 32-bit integer values (including two length bytes) Hash values in an index are stored separately from the index entry itself The hash value for an identical secondary key is shared for each index entry (row) in that index page This improves fanout when data distribution is skewed 40 Physical index organization: Compressed indexes Internal nodes in the index contain a Patricia trie PATRICIA: Practical Algorithm to Retrieve Information Coded in Alphanumeric (D. R. Morrison, J. ACM Vol. 15, 1968) Combines a binary trie with an optimization to skip over bit comparisons that would result from one-way branching Result: automatic compression of string data Excellent fanout of internal nodes Common substrings of key values have a negligible impact on space requirements and performance Superb performance improvements in many cases, especially with composite primary and foreign keys 41 Clustered index support First offered with the 8.0.2 release At most one clustered index per table (may be a temporary table) May be secondary index, PK, FK, UNIQUE constraint Optimizer assumes PK indexes are clustered unless a different clustering index exists Engine will not attempt to maintain clustering on PK indexes unless they are declared CLUSTERED May be hash or compressed index Clustering characteristic stored in SYSATTRIBUTE catalog table CLUSTERED keyword can be used in both CREATE INDEX and CREATE/ALTER TABLE statements However, ALTER does not reorganize the table; use REORGANIZE TABLE 42 Clustered index support On INSERT/LOAD TABLE, server attempts to keep rows physically adjacent in base table pages Specification of PCTFREE on LOAD can be critical Adjacency is NOT guaranteed; ORDER BY still requires a physical sort or indexed retrieval Can significantly improve performance Optimizer costs clustered index access differently Consider their use with queries that involve range predicates Often useful with DATE or TIMESTAMP columns Use REORGANIZE TABLE or UNLOAD/RELOAD if clustering degrades over time ALTER INDEX statement can rename an index or change its clustering attribute 43 Physical index organization: fanout and page size Fanout refers to the number of index entries on a page Lower fanout means greater index depth, and hence more costly random retrieval Fanout is affected by Page size Hash value size/trie compression Distribution of key values Index maintenance Fanout can degrade over time sa_index_density() procedure 44 Indexes and query processing ASA does not store actual data values in the index implies each base row must be retrieved to Fetch the values of any attributes, or To compare keys longer than the maximum hash value size Indexes are automatically created to enforce referential integrity Primary keys, foreign keys, unique constraints All related indexes must be the same type (hash or compressed) Maximum number of indexes is dependent on page size <= 4K: 2048 indexes 8K: 1024 indexes 16K: 512 indexes 32K: 256 indexes 45 Indexes and query processing Each indexed column can be ascending or descending Index is scanned backwards if the application scrolls in the opposite direction, or an ORDER BY clause specifies the reverse sequence Support for merge and hash joins means that ASA will often use sequential scans, rather than indexed retrieval 46 REORGANIZE Statement – base tables REORGANIZE TABLE tablename Defragments rows on-the-fly by removing/inserting groups of rows in clustered index (or PK) order Exclusive lock held on the table while a group is processed; commits occur periodically to enable other applications to run, checkpoints are suspended while the group is being processed Performs implicit COMMITs during operation Rows will be in clustered sequence when operation is complete (except possibly concurrent UPDATES) Use new procedure sa_table_fragmentation() to discover tables that warrant reorganization 47 REORGANIZE Statement - indexes REORGANIZE TABLE tablename [ index specification ] INDEX indexname FOREIGN KEY indexname PRIMARY KEY Exclusive lock is held throughout CHECKPOINTs are suspended Reclaims space lost to update activity Re-balances the index, especially important after many DELETE operations Use the new procedure sa_index_density() to identify indexes that require reorganization 48 Data management improvements in 9.0.1 Better scalability – new lock-free cache manager Substantially better performance across the board Support for page checksums New option for dbinit and CREATE DATABASE statement Supported by dbvalid utility, and a new statement VALIDATE CHECKSUM Overhead: largely depends on CPU speed. Examples: 2.8 milliseconds per I/O for 32K pages 0.7 milliseconds per I/O for 8K pages Improvements to dynamic cache sizing Sampling rate changes with database growth or the starting of a new database on the same server 49 Data management improvements in 9.0.1 Database cache warming feature Two operational phases, collection and reload During collection, page IDs are saved in the database as they are accessed at startup During reload, collected page IDs are read into cache as background processing Checks and balances used to prevent swamping the server with I/O during server startup Need to test performance before deploying Cache warming is *enabled* by default 50 Data management improvements in 9.0.1 Optimistic locking introduced for WAIT_FOR_COMMIT Controlled by a new connection option OPTIMISTIC_WAIT_FOR_COMMIT Temporary dbspace can be grown with ALTER DBSPACE Can improve performance of complex queries by ensuring that the temp file is not fragmented on disk Size of temporary dbspace can be controlled with a governor New public option TEMP_SPACE_LIMIT_CHECK (default OFF) When OFF, engine’s default behaviour is to die with a DISK FULL error Jasper release: default is ON Server computes a temp space quota for each request; if quota is exceeded and temporary dbspace is at least 80% of its maximum size, request fails with SQLSTATE 54W05 Quota computed using amount of disk free space on that partition, and number of active connections Shipped in 9.0.0 build 1308, 9.0.1 build 1872, 8.0.3 build 4991 51 Data management improvements in 9.0.1 ALTER INDEX statement Can rename an index, or alter its clustering attribute Ability to create an index on a function Automatically adds a computed column “column-name” to the table Creates an index on the computed column Relies on the optimizer to replace any function occurrences with the computed column CREATE INDEX index-name ON [owner.]table-name ( function( arg [, ...] ) AS column-name ) [{IN | ON} dbspace-name] 52 Data management improvements in 9.0.1 Non-transactional temporary tables Unaffected by COMMIT or ROLLBACK; no entries made to rollback log Procedure, trigger, and view text can be hidden from other users by using SET HIDDEN (8.0.2) LOAD TABLE enhancements: can be used on local temporary tables (8.0.2) ORDER clause (8.0.2) Control over which column histograms are built (9.0.0) 53 Data management improvements in 9.0.1 DEDICATED_TASK option (DBA-only, temporary only) UUIDs and GUIDs can be used as surrogate keys - see newid() function (8.0.2) XML data type SYSHISTORY system table Statistics (depth, leaf pages) maintained on indexes in real time (introduced in 8.0.2EBF) Hash(), compress(), encrypt() builtin functions Can be used to compress or encrypt individual string or binary fields in the database Values can be viewed, processed with decrypt() and decompress() functions 54 Data management improvements in 9.0.1 ALTER DATABASE can now modify transaction log identically to DBLOG utility BACKUP and DBBACKUP can now rename the log copy ALTER VIEW WITH RECOMPILE Event handling improvements: Two new parameters for event_parameter: APPINFO DisconnectReason: ‘from client’, ‘drop connection’, ‘liveness’, ‘inactive’, ‘connect failed’ New cost model for Ultralite requests New DTT function based on analysis of several current models of pocket PC devices Equates random and sequential I/O to produce better Ultralite query plans 55 Data management improvements in 9.0.2 Temporary stored procedures they are visible only by the connection which creates them, and are automatically dropped when the connection is dropped. they can be explicitly dropped, but may not be ALTERed. GRANT and REVOKE are not permitted on temporary procedures. they are not recorded in the catalog or in the transaction log they can be created and dropped when connected to a read-only database a procedure owner cannot be specified for temporary procedures. Rather, they are owned by the user that creates them. temporary external procedures are not permitted temporary procedures execute with the permissions of their creator (i.e. the current user) 56 Data management improvements in 9.0.2 CREATE LOCAL TEMPORARY TABLE defines a local temporary table which will persist until the end of a connection, or until the table is explicitly dropped. Intended for use inside procedures, functions, triggers Similar to DECLARE LOCAL TEMPORARY table if executed outside of a procedure context UUIDs are now a native SQL Anywhere type UUID_HAS_HYPHENS option Controls formatting of UUIDs (UniqueIdentifier values) when converted to strings Disk-full callback support MIN_TABLE_SIZE_FOR_HISTOGRAM is deprecated New option COLLECT_STATISTICS_ON_DML_UPDATES New option LOG_DEADLOCKS, sa_report_deadlocks() procedure Enhancements to START DATABASE statement: WITH DISTINCT SQLSTATE 57 Application profiling improvements in 9.0.2 Procedure profiling can now be performed for an individual connection or user call sa_server_option('Profile_connection',<connection-id>) call sa_server_option('ProfileFilterUser','<userid>') Request-level logging enhancements: New –zn switch to retain n log files in a ring Or use sa_server_option('RequestLogNumFiles',<n>) Can log either text or the plan for expensive queries (9.0.2EBF) -zx <cost> specifies the threshold cost, which if exceeded at either optimization or execution time the statement is logged Call sa_server_option(‘LogExpensiveQueries’) When –zp is also specified, the plans are output; otherwise, only the statement text is logged 58 Physical database design tips 59 Physical database design tips: file placement Database file placement Place transaction log, database file(s), and temporary directory on separate devices if possible if using mirrored logging, ensure the two logs are on different physical disks Temporary file placement can dramatically affect performance of complex queries Use the ASTMP environment variable to specify location for temporary file Place on a different physical drive if possible The more disk heads the better (RAID) 60 Physical database design tips: file placement Consider the use of caching disk controllers/NT striping/RAID Consider the tradeoffs Software striping offers better performance, but offers no recovery advantages RAID 5 tends to have poor write request latency: each I/O turns into four write requests that take place serially Not good for a transaction log RAID 10 (1+0) offers much better performance, at the cost of redundancy 61 File system considerations Defragment your file system occasionally, especially after an unload/reload Database file fragmentation is now displayed in the console window when the database is started Preallocate large quantities of space in contiguous chunks through the ALTER DBSPACE command Less problematic with 256K block allocation in recent ASA releases ALTER DBSPACE <dbspace-name> INSERT nnn {PAGES | KB | MB | GB | TB} Can also do this for the TEMPORARY DBSPACE Use db_extended_property() function to determine fragmentation/size of each dbspace individually (new in 9.0, also in 8.0.2.4215) Can be done for temporary dbspace and the transaction log as well 62 File system considerations Use caution when trying to run the database over a networked drive! Not all networks and/or operating systems guarantee network packet ordering Physical or logical corruption is likely Can use “-r” (read-only) switch if necessary SAN units are supported; they guarantee consistent semantics Do not use cached filesystem writes unless persistence is guaranteed Corruption is virtually certain and database cannot be recovered; will need to restore database from backup 63 Database fragmentation ASA databases never shrink Free pages will be reused for other purposes Unload/reload will recover this unused space If data is removed in the order it was inserted, fragmentation is less likely Avoid inserts of NULL values followed by updates with actual data use PCTFREE if necessary Repair fragmentation with unload/reload, or REORGANIZE TABLE Useful tools DBINFO -u stored procedure sa_table_fragmentation() 64 Physical database design tips: tables Load table data in clustering order (by default, primary key sequence) Sorting automatically performed by DBUNLOAD and by the REORGANIZE TABLE statement New ORDER syntax for (UN)LOAD TABLE Use 4K pages unless conditions warrant Watch for ordering, placement of PK columns Order in table dictates order in index Changed in Jasper! Rows are rewritten if PK columns, or column order, is changed 65 Physical database design tips: tables Use of out-of-range default values instead of NULL Reduces page fragmentation with updates Can use PCTFREE as an alternative Put large columns at end of row; fixed-size and frequentlyaccessed columns near start Prevent seeks to another table page, required to access split rows Choose your data types with care; tradeoff storage efficiency with application requirements For keys, alphanumeric strings are often more flexible 66 Physical database design tips: indexes Compressed indexes prevent many of the problems with relatively large or composite primary keys However: Surrogate keys can still be useful Usually not a good idea for significant business objects to have the same key format Self-checking keys can simplify business processing Watch for opportunities to specify a clustering index Especially with date or timestamp columns used in range queries Useful stored procedures: sa_index_levels() sa_index_density() 67 Physical database design tips: surrogate keys Consider surrogate keys when appropriate Exploit autoincrement support, or develop self-checking keys to simplify error detection 9.0 and 8.0.2 support automatic generation of universal unique identifiers (UUIDs) as surrogate keys Compatible with Microsoft’s implementation New native domain: uniqueidentifier in 9.0.2 No longer necessary to use string conversion functions such as uuidtostr(); type conversion done automatically Tradeoff their characteristics with GLOBAL AUTOINCREMENT 68 Physical database design tips: foreign keys Foreign keys are essential to the optimization of complex queries Join selectivity and cardinality estimation is much more accurate when foreign key constraints are present Also enable a variety of query rewrite optimizations But tradeoff using declarative referential integrity Downside is the maintenance cost for indexes that are not utilized in query processing In rare situations, consider eliminating some RI and check constraints once application is fully tested 69 Physical database design tips: triggers, constraints Use declarative referential integrity instead of triggers Use CHECK constraints rather than triggers for simple conditions 9.0 supports named constraints Unnamed constraints are automatically named as ‘ASAnnn’ Mark columns as NOT NULL when appropriate Don’t over-use CHECK constraints e.g. in user-defined data types Using a user-defined function in a CHECK constraint will guarantee poor update performance 70 Server configuration tips: cache size Dynamic cache sizing is instituted by default on platforms that support it Not supported for CE, Netware Can override dynamic cache sizing as necessary Server can dynamically adjust cache size depending on server workload; this is more robust in 9.0.1 Use –ch to specify an upper bound larger than 256MB If specifying cache size at startup: Need to allow for OS and application overhead CE has different defaults than other platforms Java-enabled databases require a larger minimum cache for the Java VM - 8Mb usually sufficient Watch for NT File Cache competition See white paper on memory usage (available at http://www.ianywhere.com/developer) 71 Data management in Jasper Statements concerning iAnywhere Solutions' new products are forward-looking statements that involve a number of uncertainties and risks and cannot be guaranteed. Factors that could ultimately affect such statements are detailed from time to time in Sybase's Securities and Exchange Commission filings, including but not limited to its annual report on Form 10-K and its quarterly reports on Form 10-Q (copies of which can be viewed on the Company's website). ----------------------------------------------------All of the information in this presentation are forward-looking statements, as defined above. As such, there is uncertainty associated with if or when any of these features will be added to the product. 72 Data management changes in Jasper Default page size changed to 4K New catalog implementation Catalog base tables have been renamed All catalog access by applications is through views Catalog base tables are reorganized, more efficient View dependencies on base tables and views are now tracked Improved storage organization for BLOB columns In-row BLOB prefix default is no longer fixed at 254: CHAR/VARCHAR: minimum 8, maximum 128 BINARY/VARBINARY: minimum 0, maximum 256 can override on per-column basis New storage architecture for long values, permits efficient random access 73 View dependency tracking Three states for any view: Valid: compiled and active, can be utilized in queries Invalid: view has been invalidated by the server due to dependency checking as a result of DDL on base tables Upon reference, the server will attempt to compile the view and use it if possible Otherwise, query will get an error Disabled: view has been explicitly disabled (via new statement, DISABLE VIEW), and is unusable View must be explicitly enabled in order to become valid (via new statement, ENABLE VIEW) 74 View dependency tracking Upon an ALTER (or DROP): Server attempts to acquire an exclusive lock on the object to be modified Server honours the current setting of the BLOCKING option Server then acquires exclusive locks on all dependent views If any lock cannot be acquired, the statement gets an error Once locked, all dependent views are invalidated ALTER (or DROP) statement is executed With ALTER, the server attempts to revalidate all the previously invalidated views Views successfully recompiled are marked as valid Otherwise, the view is left in the invalid state Server will attempt to recompile it when First referenced in a server session, or When other DDL is performed that may affect that view 75 Internationalization improvements Support for NCHAR data type NCHAR strings are stored as UTF-8 NCHAR specification and functions use character semantics, not byte semantics NCHAR(10) means 10 characters (1-4 bytes per character) CHAR specification now supports either BYTE or CHAR modifier E.g. CHAR(10 BYTE) or CHAR(23 CHAR) NCHAR can support either UCA (Unicode Collation Algorithm) using IBM’s ICU library Properly supports multi-byte character sorting A legacy collation stored as UTF-8 Database now can have two collations, one for NCHAR, one for CHAR Details in session SQL506 Monday afternoon 76 Indexing changes New index implementation Improved implementation of compressed B-tree indexes Key values are duplicated in the index to support index-only retrieval and snapshot isolation Older “hash”-based indexes have been dropped entirely Index column order for primary keys now based on PK constraint declaration, not column order in table PK can be altered, reordered without rewriting all the rows in the table Order specification can now be specified with any constraint index e.g. PRIMARY KEY (X ASC, Y DESC, Z ASC) Foreign key column order can now be different than that of PK All indexes now appear in the SYSINDEXES view Planned: Ability to declare that a FK is unique (to enforce a 1:1 relationship) Abstract indexes into logical and physical implementations Redundant indexes will not be created 77 Shareable global temporary tables Shared global temporary tables New syntax: CREATE GLOBAL TEMPORARY TABLE ….. SHARE BY ALL The contents of the table will persist until explicitly deleted or until the database is shut down. On database startup, the table will be empty. Row locking on shared temporary tables behaves the same as for permanent tables Inserts, updates and deletes on shared temporary tables are not recorded in the transaction log Column statistics are maintained in memory by the server. 78 Data management changes in Jasper Last modification time for any row in a table now retained in SYSTABLE Resolution is one second LOAD TABLE enhancements: better performance, ENCODING option, ROW DELIMITED BY option Apply multiple transaction logs at startup (can specify a directory) Better row-level locking implementation Elimination of key-range locking with anti-insert locks Planned: introduction of INTENT locks (e.g. FETCH FOR UPDATE) Improved administration of large databases: Parallel backup Auto-tuning to exploit multiple CPU’s on SMP hardware Faster unload/reload, index creation, database validation 79 Database mirroring Provides “hot” failover for a SQL Anywhere database Involves two or three separate servers: primary, mirror, arbiter Transaction log pages are passed from the primary server to the mirror to keep the mirror up-to-date Mirror server is not accessible by any other connections Effectively the mirror server is in continuous recovery mode Log pages can be passed in three modes: Synchronously (default) on COMMIT Asynchronously on COMMIT – better performance than synchronous mode Asynchronously when log page is full, with a timeout option Async implies the usual caveats with possible lost transactions Role switch occurs if primary server fails Arbiter used to verify the mirror state before role switch proceeds Clients are disconnected from the primary server Must reconnect to the mirror See Techwave session SQL508 – High Availability ASA on Wednesday 80 Snapshot isolation support Provides read-consistency in the face of concurrent writes from other transactions (e.g. writers do not block readers) Enabled by a global database option, allow_snapshot_isolation Three new transaction isolation levels: “snapshot” – cleanest semantics, transaction sees a consistent view of the database as of transaction start (the time the first row was accessed) “stmt-snapshot” – requires less resources, however each statement sees a consistent state of the database but at different times Only one snapshot time exists for a connection; outermost or first statement sets the transaction time “read-only-stmt-snapshot” – like stmt-snapshot, but only for queries; update statements execute at isolation level 1 Usage is not free Old copies of rows are maintained in a “row version store” (part of the database’s temp file) for as long as necessary to ensure consistency for any transaction Indexes have a mix of “old” and “current” values Can affect the performance of both sequential and index scans 81 Snapshot isolation support Setting the isolation level: set transaction isolation level snapshot set transaction isolation level statement snapshot set transaction isolation level read only statement snapshot Or within an ODBC application, use SA_SQL_TXN_SNAPSHOT SA_SQL_TXN_STATEMENT_SNAPSHOT SA_SQL_TXN_READ_ONLY_STATEMENT_SNAPSHOT Update conflicts are still possible Isolation levels can be mixed (but not recommended) Database property VersionStorePages contains the number of pages in the temp file devoted to copies of old rows BLOB values do not reside in the temp file, but remain in the main database file and are reference counted Some restrictions on DDL when snapshot transactions are in progress (ALTER TABLE, etc.) 82 Lazy CHECKPOINTs A Jasper server can now initiate a CHECKPOINT and perform other operations while it takes place. In previous releases, all database activity would stop while the CHECKPOINT took place. There can only be one CHECKPOINT in progress at a time. If a CHECKPOINT is already in progress, then any operation like an ALTER TABLE or CREATE INDEX that wants to initiate a new CHECKPOINT needs to wait for the last one to finish. Lazy checkpoints are not used if using the –m option Documented by START CHECKPOINT and FINISH CHECKPOINT records in the transaction log 83 Application profiling and request-level logging Major enhancements in the Jasper release Unified logging architecture Can log data to a database, rather than a flat file Can log data to a different database, even on another server Much lower overhead Considerably greater detail in diagnostic information Lock contention Statements within stored procedures and triggers Elapsed times Query plans Planned improvements to DBCONSOLE for real-time server status Attend sessions SQL501/514 Tuesday afternoon at 1:30 ASA Performance Analysis from Start to Finish 84 iAnywhere at TechWave 2005 Ask the iAnywhere Experts on the Technology Boardwalk (exhibit hall) • Drop in during exhibit hall hours and have all your questions answered by our technical experts! • Appointments outside of exhibit hall hours are also available to speak one-on-one with our Senior Engineers. Ask questions or get your yearly technical review – ask us for details! TechWave ToGo Channel • TechWave ToGo, an AvantGo channel providing up-to-date information about TechWave classes, events, maps and more –now available via your handheld device! • www.ianywhere.com/techwavetogo iAnywhere Developer Community - A one-stop source for technical information! Access to newsgroups,new betas and code samples • Monthly technical newsletters • Technical whitepapers,tips and online product documentation • Current webcast,class,conference and seminar listings • Excellent resources for commonly asked questions • All available express bug fixes and patches • Network with thousands of industry experts http://www.ianywhere.com/developer/ 85 SQL Anywhere ‘Jasper’ Release Learn more about 'Jasper', the upcoming SQL Anywhere release, loaded with features focused on: • Enhanced data management including performance, data protection, and developer productivity • Innovative data movement including manageability, flexibility and performance, and messaging Attend the following sessions: SQL Anywhere 'Jasper' New Feature Overview Session SQL512 will be held Monday, August 22nd, 1:30pm MobiLink 'Jasper' New Feature Overview Session SQL515 will be held Wednesday, August 24th, 1:30pm ... and remember to look for sneak peeks in other sessions and morning education courses! Register for the Jasper Beta program: www.ianywhere.com/jasper 86 Questions ? 87