Download Techwave_2005_am18a_ASAInternals

Document related concepts

IMDb wikipedia , lookup

Serializability wikipedia , lookup

Microsoft Access wikipedia , lookup

Oracle Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

SQL wikipedia , lookup

Concurrency control wikipedia , lookup

PL/SQL wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Versant Object Database wikipedia , lookup

ContactPoint wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Transcript
AM18
ASA INTERNALS: DATA MANAGEMENT
GLENN PAULLEY, DEVELOPMENT MANAGER
[email protected]
AUGUST 2005
Goals of this presentation
Overview of data management and query processing in
Adaptive Server Anywhere 9.0.2
 Concentrate on performance issues and problem areas
 Provide an overview of SQL Anywhere 9.0 technology
 Highlight planned features for the Jasper release
Agenda
 Section One: SQL language support, data management
 Section Two: query execution and optimization
2
Design goals of SQL Anywhere Studio
 Ease of administration
 Good out-of-the-box performance
 “Embeddability” features  self-tuning
 Cross-platform support
 Interoperability
3
Motivation for the ASA 9.0 release
Exploit the new architecture of 8.0 and add support for
additional language features, including
 GROUP BY ROLLUP
 RECURSIVE UNION
 Window functions and other OLAP support
 XML
 Table Functions
 INTERSECT and EXCEPT
 ORDER BY, SELECT TOP N in any query block, including views
Improve performance
4
Highlights of the ASA 9.0 releases
HTTP server
ASA Index Consultant
Improved performance, scalability
 better scalability in OLTP environments
Query processing improvements
 optimization refinements – particularly with the server’s cost model
 histograms modified according to update DML statements
 alternate, efficient execution methods for complex queries
SNMP support
 9.0.1 EBF build 1828, Windows platforms only
 Formally part of the 9.0.2 release
5
Performance, performance,
performance
Version comparison, 10GB DB, Minutes
15.0
13.0
11.0
9.0
7.0
5.0
3.0
1.0
-1.0
Q01
Q02
7.0.4.2788
14.6
1.1
Q04
Q05
Q06
Q10
Q11
Q12
Q14
Q15
Q16
Q17
1068. 20.7
Q03
52.8
1.0
515.2 90.2 825.1 29.1
8.0.0.2065
7.7
1.0
8.1
9.0.0.1073
4.6
2.6
3.1
9.0.1.1751
4.2
0.7
10.0.1212
3.8
0.6
Q07
Q08
Q09
16.1
12.8 177.8
3.8
1.2
2.9
8.3
227.3 1500. 1500. 1500. 1500. 412.2
6.8
7.9
2.7
672.7
9.2
1.9
6.5
13.5
2.5
4.9
5.2
6.0
1500. 1500. 1500. 1500. 1500. 408.6
2.4
3.3
1.0
3.2
3.4
6.2
3.5
0.7
2.4
3.7
0.3
0.5
2.6
4.7
14.1
3.2
1.5
8.9
0.9
3.5
5.7
1.9
2.8
1.2
3.3
2.9
4.7
2.5
0.5
1.9
1.5
0.4
1.5
1.5
2.2
6.7
2.3
1.9
6.6
0.7
2.6
2.2
1.7
2.4
1.0
2.9
2.5
4.2
2.0
0.5
1.8
1.5
0.3
0.6
1.2
1.4
4.5
1.9
1.7
5.8
1.1
2.1
717.9 13.6
Q13
Q18
Q19
Q20
Q21
Q22
Avg
6
Contents
 Language Support
 New SQL constructs supported with 9.0.1
 Data Management in 9.0.1
 Database organization
 Table storage organization
 Index storage organization
 Physical database design tips
 Jasper features
7
New SQL language support in 9.0.1
 Table functions (SELECT over a stored procedure)
 ORDER BY clause now supported in all SELECT blocks
 Necessary to support SELECT TOP n in derived tables, views,
and subqueries with correct semantics
 RECURSIVE UNION (bill-of-materials) queries
 INTERSECT and EXCEPT query expressions
 LATERAL keyword for derived tables
 Now necessary for derived tables or table expressions containing
outer references
 WITH clause (common table expressions)
 Essentially in-lined view definitions
8
New SQL language support in 9.0.1
 SELECT TOP n START AT m
 Equivalent functionality to that in MySQL, Postgres
 n and m can be variables or host variables
 WITH INDEX hint in FROM clause
 Named CHECK, PK, FK, UNIQUE constraints
 Constraint violation message refers to the constraint name
 New catalog tables:
 SYSCONSTRAINT contains information about all constraints, even
referential integrity constraints
 SYSCHECK contains the body of the CHECK constraint; now permit
multiple CHECK constraints on the same column(s)
 Specific CHECK constraint that is violated appears in error
 Not available in older database formats, even if DBUPGRAD is
used
9
New SQL language support in 9.0.1
 OLAP support
 VARIANCE, STD_DEV aggregate functions
 ORDER BY clause for LIST aggregate function
 GROUP BY
 ROLLUP, CUBE, GROUPING SETS
 Binary set functions (linear regression, co-variance, etc.)
 Rank functions
 Windowed aggregate functions
 Construct “moving average” results in a single SQL statement
 Support for multiple DISTINCT aggregate functions in a single
SELECT block
 Necessitates the use of Hash Group By
10
New SQL language support in 9.0.1
 Support for SET statement in Transact-SQL dialect stored
procedures
 Implemented for MS SQL Server compatibility
 EXECUTE IMMEDIATE extensions
 Procedures can now use EXECUTE IMMEDIATE to execute
dynamically-constructed queries which return a result set
 WITH ESCAPES ON | OFF
 WITH QUOTES ON | OFF
 Variable assignment permitted in UPDATE statements (8.0.1)
 SELECT INTO base-table
11
New SQL language support in 9.0.1
 FOR XML AUTO, FOR XML RAW, FOR XML EXPLICIT,
OPENXML procedure (supports XPATH queries over XML
column values)
 SQLX functionality: xmlelement(), xmlforest(), xmlgen(),
xmlconcat(), and xmlagg()
 EXPRTYPE() function – outputs the type of the expression
argument
 Useful when defining computed columns
 LOCATE() can handle negative offsets
 INSERT WITH AUTO NAME (8.0.2)
12
Table functions
SELECT *
FROM SYS.SYSTABLE as st, sa_table_fragmentation() as tbfrg
WHERE st.table_name = tbfrg.tablename
 Result set description determined from the catalog; result set
must match exactly
 Otherwise SQLSTATE ‘WP012’
 Workaround: use the WITH clause to annotate the procedure
reference in the FROM clause:
SELECT * FROM PROC() WITH( X Integer, Y char(17) )
13
Table functions
 Procedure may return only one result set
 Statistics regarding cost, result set cardinality of the procedure
are captured at run time; used for subsequent requests
 Statistics are stored in SYS.SYSPROCEDURE
 Minimally requires DBUPGRAD of older databases to 9.0.0
14
Recursive UNION
 SQL-2003 implementation of recursive (bill-of-materials) queries
 Only DB2 also offers RECURSIVE UNION support; Oracle
implements a ‘cycle’ clause
 Uses specialized join operators: recursive hash inner and outer
joins
 will utilize a nested-loop strategy if inputs are small; done adaptively at
run-time during query execution
WITH RECURSIVE r (level, emp_id, manager_id) as (
SELECT 1, emp_id, manager_id
FROM employee
WHERE emp_id = manager_id
UNION ALL
SELECT level+1, e.emp_id, e.manager_id
FROM employee e JOIN r ON (e.manager_id = r.emp_id)
WHERE e.emp_id <> e.manager_id and level < 3)
SELECT * FROM r
15
Recursive UNION: restrictions
 Query expression must be UNION ALL
 Recursive reference must be in a query block that does not
contain DISTINCT, aggregation, or an ORDER BY clause
 Recursive reference in a LEFT OUTER JOIN is permitted
 Schema of WITH clause must match recursive query
 Implicit type conversions involving truncation can yield undesired
results; SQLSTATE 42WA2 returned if server detects a type
mismatch
 Use CAST to ensure compatible types
 Infinite queries are possible; server kills the query after N
recursions
 controlled by the new connection option
MAX_RECURSIVE_ITERATIONS (default 100)
16
INTERSECT and EXCEPT
 Implement set/bag difference and set/bag intersection
 Both ALL and DISTINCT variants are supported; DISTINCT
performed by default
 Form query expressions in the same fashion as UNION
 NULL treated as a special value in each domain, hence NULLs
are equivalent to each other
 Useful when formulating queries that require counting of
identical rows
 See the help for order-of-precedence amongst the set
operators
17
EXCEPT and INTERSECT ALL
 Rewrite to transform ALL to DISTINCT done automatically by
the optimizer
 Both EXCEPT and INTERSECT can be computed through
either a merge or hashing technique
 Also supports an (expensive) nested-loop strategy in case a
cache shortage is encountered
 With ALL variants:
 implicitly performs aggregation to count the number of duplicate
rows in each input
 A new query execution operator, ROW REPLICATE, generates
the required copies of each row
SELECT description FROM product
EXCEPT ALL
SELECT description FROM product as p2 WHERE quantity < 15
18
GROUP BY ROLLUP
 Computes aggregates as usual, but result set contains
multiple sets of groups
 Logically, grouping is performed N+1 times for N grouping
expressions
 Essentially implements the functionality of COBOL Report
Writer in a single SQL request
SELECT state, zip, count(*), grouping(zip), grouping(state)
FROM customer
GROUP BY ROLLUP (state, zip)
19
GROUP BY CUBE
 Computes aggregates as usual, but result set contains the
power set of the N grouping expressions
 Expensive to execute for large N
 Result can be restricted through the specification of
GROUPING SETS
SELECT state, zip, count(*), grouping(zip), grouping(state)
FROM customer
GROUP BY CUBE (state, zip)
SELECT state, zip, count(*), grouping(zip), grouping(state)
FROM customer
GROUP BY GROUPING SETS ( (state, zip), state, zip, () )
20
WINDOW functions
 Part of SQL OLAP extensions
 Computes aggregates (except LIST) over a window of rows
 Provides an ANSI-compliant way to number the rows of a result set
 ROW_NUMBER() rather than NUMBER(*)
 Useful to:
 Compute cumulative aggregates, or “moving averages”
 Eliminate the need for correlated subqueries involving aggregation
21
WINDOW functions
List employees, by department, in four US states by their start
dates, along with their cumulative salaries:
SELECT dept_id, emp_lname, start_date, salary,
SUM(salary) OVER (PARTITION BY dept_id ORDER BY start_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS “Sum_Salary"
FROM employee
WHERE state IN ('CA', 'UT', 'NY', 'AZ') AND dept_id IN ('100', '200')
ORDER BY dept_id, start_date;
22
WINDOW functions
List all orders (with part information) where the part quantity
cannot cover the maximum single order for that part:
SELECT o.id, o.order_date, p.*
FROM sales_order o, sales_order_items s, product p
WHERE o.id = s.id and s.prod_id = p.id
and p.quantity < (SELECT max(s2.quantity)
FROM sales_order_items s2
WHERE s2.prod_id = p.id)
ORDER BY p.id, o.id
SELECT order_qty.id, o.order_date, p.*, max_q
FROM ( SELECT s.id, s.prod_id,
MAX(s.quantity) OVER (partition BY s.prod_id order by s.prod_id) AS max_q
FROM sales_order_items s) as order_qty,
product p,
sales_order o
WHERE p.id = prod_id and o.id = order_qty.id and p.quantity < max_q
ORDER BY p.id, o.id
23
WINDOW functions
Find the salespeople with the best sales (total amount) for each
product, including ties:
SELECT s.prod_id, o.sales_rep, SUM(s.quantity) as total_quantity, SUM(s.quantity * p.unit_price) as total_sales
FROM sales_order o KEY JOIN sales_order_items s KEY JOIN product p
GROUP BY s.prod_id, o.sales_rep
HAVING total_sales = (SELECT FIRST SUM(s2.quantity * p2.unit_price) as sum_sales
FROM sales_order o2 KEY JOIN sales_order_items s2 KEY JOIN product p2
WHERE s2.prod_id = s.prod_id
GROUP BY o2.sales_rep
ORDER BY sum_sales DESC )
ORDER BY s.prod_id
SELECT v.prod_id, v.sales_rep, v.total_quantity, v.total_sales
FROM ( SELECT o.sales_rep, s.prod_id, SUM(s.quantity) as total_quantity,
SUM(s.quantity * p.unit_price) as total_sales,
RANK() OVER (PARTITION BY s.prod_id
ORDER BY SUM(s.quantity * p.unit_price) DESC) as sales_ranking
FROM sales_order o KEY JOIN sales_order_items s KEY JOIN product p
GROUP BY o.sales_rep, s.prod_id ) as v
WHERE sales_ranking = 1
ORDER by v.prod_id
24
Data Management in 9.0.2
25
Moving to ASA 9.0.2
 If database is 8.0.2, unload/reload to 9.0 is largely
unnecessary
 DBUPGRAD to 9.0 required for some catalog schema changes, in
particular for the Index Consultant
 There should be no consequences of using DBUPGRAD with
respect to performance
 However:




only 9.0 format databases support named constraints
only 9.0 format databases support cache warming
only 9.0.1 databases support page checksums
8.0.2 databases do not support index statistics collection by
default
 Can be turned on when creating the database via CREATE
DATABASE (but not dbinit)
26
Moving to ASA 9.0.2
Otherwise, unload/reload from 8.0.1 or 8.0.0
recommended
 Clustered index support
 Better statistics management
 Improved histogram organization, statistics collection
 Index statistics kept persistent in the database file
 Improved histograms
 Cache warming on startup
 Checksums on database pages
 PCTFREE option for base and temporary tables
27
Moving to SQL Anywhere “Jasper”
 The Jasper release of the SQL Anywhere server will not
support older database formats
 Jasper will ship with a migration tool to convert an existing
database into a Jasper-format database
28
Database organization
 A database consists of up to 13 “dbspaces”
 Maximum size of each dbspace is limited by the underlying operating
system
 Maximum database size is also determined by page size
 Limit for any dbspace is 2**28 (256 million) pages
 Each dbspace, the temporary file, and the transaction log is a simple
OS file
 Ease of administration, backup
 Temporary file is used for temporary tables
 A dbspace file grows in 256K extents (512K if 16K pages, 1Mb if 32K
pages)
 Database files can be copied to/from different endian machines
 Can copy database from Wintel to big-endian UNIX systems and back
again
 Server automatically does data conversion where necessary
29
Database organization
 A database file contains:





table pages
index pages
free pages
rollback log pages
checkpoint log pages
 Each dbspace for a database must use the same page size
30
Physical organization: tables
 Each table uses an independent set of table pages
 Each table allocates at least one page, even if the table is empty
 Server maintains bit-maps for table pages
 Supports clustering of table pages in the same portion of the
database file
 Facilitates large-block I/O – SQL Anywhere reads 64K at a time
when doing sequential scans
 Result: considerably faster sequential scan performance
31
Physical organization: tables
 New in 8.0.2: ‘scattered read’ support on Windows 2000 and
Windows XP
 Another mainframe technology being reinvented on PC/UNIX
servers
 aka “locate-mode I/O”
 Improves performance, reduces memory requirements
 Coming to other platforms as vendors implement it
 Tables cannot span dbspaces
 Each secondary index on a table can be stored in a separate
dbspace
 Recommended if multiple spindles are available (not necessary
for RAID devices)
 Partition dbspaces on separate devices whenever possible
 Brings more disk arms to bear, reducing seek latency
32
Physical organization: tables
 Rows are inserted into pages at a point where, if at all
possible, the entire row can be stored contiguously
 Caveat: row segments are at most 4K; second or subsequent row
segments can appear on different pages
 Columns are packed tightly together; only unpadded values
are stored on disk
 Primary key columns are always at the beginning of each row,
in sequence
 Server may rewrite all rows if PK added or modified
 Rows can be of (almost) unlimited size; are split across pages
where necessary
 Maximum length of any column is 2Gb
 Maximum number of rows per page is 255
33
Physical organization: tables
 Rows are not guaranteed to be placed in pages corresponding to
their insertion order
 By default, ASA uses a first-fit algorithm for page selection
 To guarantee ordering of a result set, specify an ORDER BY clause
 Space is not reserved for columns that are null
 BLOB values are stored in a separate “arena” of pages
 First 255 bytes are stored together with the row
 Access to the rest of the BLOB value will almost certainly require a SEEK
 Implications for choice of page size
 Once inserted, a row identifier is immutable
 An updated row must be split if its new length does not allow it to fit on
the page
34
Physical organization: tables
 Table pages are allocated in 8 page clusters; cluster allocation
depends on page size
 2K: grow 4 clusters at a time
 4K: grow 2 clusters at a time
 All other page sizes: one cluster at a time
 ASA will re-use database pages for additional inserts if entire pages
are freed
 Defaults: for 1K pages, free space is 100 bytes; all other page sizes is
200 bytes
 DBA can specify freespace percentage to accommodate future table
UPDATEs using PCTFREE
 PCTFREE characteristic stored in new catalog table SYSATTRIBUTE
(and corresponding table SYSATTRIBUTENAME)
 Can be specified for temporary tables
35
Page sizes
 Page sizes supported are 1K, 2K, 4K, 8K, 16K, 32K
 2K page size minimum on all UNIX platforms
 Default changed to 2K in the 6.0.3 release
 A server can support several databases concurrently
 Buffer pool page size will be the largest database page size specified
on the command line
 Consider tradeoffs with your choice of page size
 4K recommended; occasionally 8K may offer improved
performance
 Default will change to 4K with Jasper release
 Do not use 16K or 32K pages unless you have a specialty
application
 In typical environments, large page sizes cause inefficient use of cache
36
Choice of page size does matter
 Larger rows usually require larger pages (requires fewer split
rows)
 Random retrieval performance is dependent on the application
 Larger pages can pollute the cache with unnecessary data
 Often require larger buffer pools to accommodate the application’s
working set
 Smaller pages are more cache efficient, but
 Smaller pages reduce index fanout, and can increase index depth
37
Choice of page size does matter
 Don’t ignore index maintenance costs when considering page
size (larger page sizes can mean increased cache pressure)
 Test your application with different alternatives
 Your mileage may vary
 A 4K page size is a typical choice for many applications
 My recommendation: use 4K pages unless thorough testing
proves that a different page size offers better
performance/scalability
 See data storage whitepaper
 Available at www.ianywhere.com/developer
 Recently updated for 9.0.0
38
Physical organization: indexes
 ASA 9.0 supports two different types of indexes:
 Hash-based
 Key is a one-way order-preserving encoding of at most nine bytes
of the data values
 Hash-based indexes are still used when the key length does not
satisfy the limits for compressed indexes
 Compressed
 Contains Patricia tries in the index’s internal nodes
 Used for keys > 10 bytes and less than
 122 bytes with 1K pages
 248 bytes for all other page sizes
 Substantially improved performance with larger keys
39
Physical index organization: hashbased indexes
 Values in an index are “hashed” into a key of at most 10 bytes
using an order-preserving encoding function
 WITH HASH SIZE is deprecated
 Each indexed column encoded separately, with a one-byte
length
 A 10-byte hash value can hold two 32-bit integer values (including
two length bytes)
 Hash values in an index are stored separately from the index
entry itself
 The hash value for an identical secondary key is shared for
each index entry (row) in that index page
 This improves fanout when data distribution is skewed
40
Physical index organization:
Compressed indexes
 Internal nodes in the index contain a Patricia trie
 PATRICIA: Practical Algorithm to Retrieve Information Coded
in Alphanumeric (D. R. Morrison, J. ACM Vol. 15, 1968)
 Combines a binary trie with an optimization to skip over bit
comparisons that would result from one-way branching
 Result: automatic compression of string data
 Excellent fanout of internal nodes
 Common substrings of key values have a negligible impact on
space requirements and performance
 Superb performance improvements in many cases, especially
with composite primary and foreign keys
41
Clustered index support
 First offered with the 8.0.2 release
 At most one clustered index per table (may be a temporary table)
 May be secondary index, PK, FK, UNIQUE constraint
 Optimizer assumes PK indexes are clustered unless a different
clustering index exists
 Engine will not attempt to maintain clustering on PK indexes unless
they are declared CLUSTERED
 May be hash or compressed index
 Clustering characteristic stored in SYSATTRIBUTE catalog table
 CLUSTERED keyword can be used in both CREATE INDEX and
CREATE/ALTER TABLE statements
 However, ALTER does not reorganize the table; use REORGANIZE
TABLE
42
Clustered index support
 On INSERT/LOAD TABLE, server attempts to keep rows
physically adjacent in base table pages
 Specification of PCTFREE on LOAD can be critical
 Adjacency is NOT guaranteed; ORDER BY still requires a
physical sort or indexed retrieval
 Can significantly improve performance
 Optimizer costs clustered index access differently
 Consider their use with queries that involve range predicates
 Often useful with DATE or TIMESTAMP columns
 Use REORGANIZE TABLE or UNLOAD/RELOAD if clustering
degrades over time
 ALTER INDEX statement can rename an index or change its
clustering attribute
43
Physical index organization: fanout
and page size
 Fanout refers to the number of index entries on a page
 Lower fanout means greater index depth, and hence more
costly random retrieval
 Fanout is affected by




Page size
Hash value size/trie compression
Distribution of key values
Index maintenance
 Fanout can degrade over time
 sa_index_density() procedure
44
Indexes and query processing
 ASA does not store actual data values in the index
 implies each base row must be retrieved to
 Fetch the values of any attributes, or
 To compare keys longer than the maximum hash value size
 Indexes are automatically created to enforce referential
integrity
 Primary keys, foreign keys, unique constraints
 All related indexes must be the same type (hash or compressed)
 Maximum number of indexes is dependent on page size




<= 4K: 2048 indexes
8K: 1024 indexes
16K: 512 indexes
32K: 256 indexes
45
Indexes and query processing
 Each indexed column can be ascending or descending
 Index is scanned backwards if the application scrolls in the
opposite direction, or an ORDER BY clause specifies the reverse
sequence
 Support for merge and hash joins means that ASA will often
use sequential scans, rather than indexed retrieval
46
REORGANIZE Statement – base tables
 REORGANIZE TABLE tablename
 Defragments rows on-the-fly by removing/inserting groups of
rows in clustered index (or PK) order
 Exclusive lock held on the table while a group is processed;
commits occur periodically to enable other applications to run,
checkpoints are suspended while the group is being
processed
 Performs implicit COMMITs during operation
 Rows will be in clustered sequence when operation is
complete (except possibly concurrent UPDATES)
 Use new procedure sa_table_fragmentation() to discover
tables that warrant reorganization
47
REORGANIZE Statement - indexes
 REORGANIZE TABLE tablename [ index specification ]
 INDEX indexname
 FOREIGN KEY indexname
 PRIMARY KEY
 Exclusive lock is held throughout
 CHECKPOINTs are suspended
 Reclaims space lost to update activity
 Re-balances the index, especially important after many
DELETE operations
 Use the new procedure sa_index_density() to identify indexes
that require reorganization
48
Data management improvements in
9.0.1
 Better scalability – new lock-free cache manager
 Substantially better performance across the board
 Support for page checksums
 New option for dbinit and CREATE DATABASE statement
 Supported by dbvalid utility, and a new statement VALIDATE
CHECKSUM
 Overhead: largely depends on CPU speed. Examples:
 2.8 milliseconds per I/O for 32K pages
 0.7 milliseconds per I/O for 8K pages
 Improvements to dynamic cache sizing
 Sampling rate changes with database growth or the starting of a
new database on the same server
49
Data management improvements in
9.0.1
 Database cache warming feature
 Two operational phases, collection and reload
 During collection, page IDs are saved in the database as they are
accessed at startup
 During reload, collected page IDs are read into cache as
background processing
 Checks and balances used to prevent swamping the server with I/O
during server startup
 Need to test performance before deploying
 Cache warming is *enabled* by default
50
Data management improvements in
9.0.1
 Optimistic locking introduced for WAIT_FOR_COMMIT
 Controlled by a new connection option
OPTIMISTIC_WAIT_FOR_COMMIT
 Temporary dbspace can be grown with ALTER DBSPACE
 Can improve performance of complex queries by ensuring that the temp
file is not fragmented on disk
 Size of temporary dbspace can be controlled with a governor
 New public option TEMP_SPACE_LIMIT_CHECK (default OFF)
 When OFF, engine’s default behaviour is to die with a DISK FULL error
 Jasper release: default is ON
 Server computes a temp space quota for each request; if quota is
exceeded and temporary dbspace is at least 80% of its maximum size,
request fails with SQLSTATE 54W05
 Quota computed using amount of disk free space on that partition, and
number of active connections
 Shipped in 9.0.0 build 1308, 9.0.1 build 1872, 8.0.3 build 4991
51
Data management improvements in
9.0.1
 ALTER INDEX statement
 Can rename an index, or alter its clustering attribute
 Ability to create an index on a function
 Automatically adds a computed column “column-name” to the
table
 Creates an index on the computed column
 Relies on the optimizer to replace any function occurrences with
the computed column
CREATE INDEX index-name
ON [owner.]table-name ( function( arg [, ...] ) AS column-name )
[{IN | ON} dbspace-name]
52
Data management improvements in
9.0.1
 Non-transactional temporary tables
 Unaffected by COMMIT or ROLLBACK; no entries made to
rollback log
 Procedure, trigger, and view text can be hidden from other
users by using SET HIDDEN (8.0.2)
 LOAD TABLE enhancements:
 can be used on local temporary tables (8.0.2)
 ORDER clause (8.0.2)
 Control over which column histograms are built (9.0.0)
53
Data management improvements in
9.0.1
 DEDICATED_TASK option (DBA-only, temporary only)
 UUIDs and GUIDs can be used as surrogate keys - see
newid() function (8.0.2)
 XML data type
 SYSHISTORY system table
 Statistics (depth, leaf pages) maintained on indexes in real
time (introduced in 8.0.2EBF)
 Hash(), compress(), encrypt() builtin functions
 Can be used to compress or encrypt individual string or binary
fields in the database
 Values can be viewed, processed with decrypt() and
decompress() functions
54
Data management improvements in
9.0.1
 ALTER DATABASE can now modify transaction log identically to
DBLOG utility
 BACKUP and DBBACKUP can now rename the log copy
 ALTER VIEW WITH RECOMPILE
 Event handling improvements:
 Two new parameters for event_parameter:
 APPINFO
 DisconnectReason: ‘from client’, ‘drop connection’, ‘liveness’, ‘inactive’,
‘connect failed’
 New cost model for Ultralite requests
 New DTT function based on analysis of several current models of pocket
PC devices
 Equates random and sequential I/O to produce better Ultralite query
plans
55
Data management improvements in
9.0.2
 Temporary stored procedures
 they are visible only by the connection which creates them, and
are automatically dropped when the connection is dropped.
 they can be explicitly dropped, but may not be ALTERed.
 GRANT and REVOKE are not permitted on temporary
procedures.
 they are not recorded in the catalog or in the transaction log
 they can be created and dropped when connected to a read-only
database
 a procedure owner cannot be specified for temporary procedures.
Rather, they are owned by the user that creates them.
 temporary external procedures are not permitted
 temporary procedures execute with the permissions of their
creator (i.e. the current user)
56
Data management improvements in
9.0.2
 CREATE LOCAL TEMPORARY TABLE
 defines a local temporary table which will persist until the end of a
connection, or until the table is explicitly dropped.
 Intended for use inside procedures, functions, triggers
 Similar to DECLARE LOCAL TEMPORARY table if executed outside of a
procedure context
 UUIDs are now a native SQL Anywhere type
 UUID_HAS_HYPHENS option
 Controls formatting of UUIDs (UniqueIdentifier values) when converted to
strings





Disk-full callback support
MIN_TABLE_SIZE_FOR_HISTOGRAM is deprecated
New option COLLECT_STATISTICS_ON_DML_UPDATES
New option LOG_DEADLOCKS, sa_report_deadlocks() procedure
Enhancements to START DATABASE statement: WITH DISTINCT
SQLSTATE
57
Application profiling improvements in
9.0.2
 Procedure profiling can now be performed for an individual
connection or user
 call sa_server_option('Profile_connection',<connection-id>)
 call sa_server_option('ProfileFilterUser','<userid>')
 Request-level logging enhancements:
 New –zn switch to retain n log files in a ring
 Or use sa_server_option('RequestLogNumFiles',<n>)
 Can log either text or the plan for expensive queries (9.0.2EBF)
 -zx <cost> specifies the threshold cost, which if exceeded at either
optimization or execution time the statement is logged
 Call sa_server_option(‘LogExpensiveQueries’)
 When –zp is also specified, the plans are output; otherwise, only the
statement text is logged
58
Physical database design tips
59
Physical database design tips: file
placement
 Database file placement
 Place transaction log, database file(s), and temporary directory on
separate devices if possible
 if using mirrored logging, ensure the two logs are on different
physical disks
 Temporary file placement can dramatically affect performance
of complex queries
 Use the ASTMP environment variable to specify location for
temporary file
 Place on a different physical drive if possible
 The more disk heads the better (RAID)
60
Physical database design tips: file
placement
 Consider the use of caching disk controllers/NT striping/RAID
 Consider the tradeoffs
 Software striping offers better performance, but offers no recovery
advantages
 RAID 5 tends to have poor write request latency: each I/O turns
into four write requests that take place serially
 Not good for a transaction log
 RAID 10 (1+0) offers much better performance, at the cost of
redundancy
61
File system considerations
 Defragment your file system occasionally, especially after an
unload/reload
 Database file fragmentation is now displayed in the console window
when the database is started
 Preallocate large quantities of space in contiguous chunks through
the ALTER DBSPACE command
 Less problematic with 256K block allocation in recent ASA releases
 ALTER DBSPACE <dbspace-name> INSERT nnn {PAGES | KB | MB |
GB | TB}
 Can also do this for the TEMPORARY DBSPACE
 Use db_extended_property() function to determine
fragmentation/size of each dbspace individually (new in 9.0, also in
8.0.2.4215)
 Can be done for temporary dbspace and the transaction log as well
62
File system considerations
 Use caution when trying to run the database over a networked
drive!
 Not all networks and/or operating systems guarantee network
packet ordering
 Physical or logical corruption is likely
 Can use “-r” (read-only) switch if necessary
 SAN units are supported; they guarantee consistent semantics
 Do not use cached filesystem writes unless persistence is
guaranteed
 Corruption is virtually certain and database cannot be recovered;
will need to restore database from backup
63
Database fragmentation
 ASA databases never shrink
 Free pages will be reused for other purposes
 Unload/reload will recover this unused space
 If data is removed in the order it was inserted, fragmentation is
less likely
 Avoid inserts of NULL values followed by updates with actual
data  use PCTFREE if necessary
 Repair fragmentation with unload/reload, or REORGANIZE
TABLE
 Useful tools
 DBINFO -u
 stored procedure sa_table_fragmentation()
64
Physical database design tips: tables
 Load table data in clustering order (by default, primary key
sequence)
 Sorting automatically performed by DBUNLOAD and by the
REORGANIZE TABLE statement
 New ORDER syntax for (UN)LOAD TABLE
 Use 4K pages unless conditions warrant
 Watch for ordering, placement of PK columns
 Order in table dictates order in index
 Changed in Jasper!
 Rows are rewritten if PK columns, or column order, is changed
65
Physical database design tips: tables
 Use of out-of-range default values instead of NULL
 Reduces page fragmentation with updates
 Can use PCTFREE as an alternative
 Put large columns at end of row; fixed-size and frequentlyaccessed columns near start
 Prevent seeks to another table page, required to access split rows
 Choose your data types with care; tradeoff storage efficiency with
application requirements
 For keys, alphanumeric strings are often more flexible
66
Physical database design tips: indexes
 Compressed indexes prevent many of the problems with
relatively large or composite primary keys
 However:
 Surrogate keys can still be useful
 Usually not a good idea for significant business objects to have
the same key format
 Self-checking keys can simplify business processing
 Watch for opportunities to specify a clustering index
 Especially with date or timestamp columns used in range queries
 Useful stored procedures:
 sa_index_levels()
 sa_index_density()
67
Physical database design tips:
surrogate keys
 Consider surrogate keys when appropriate
 Exploit autoincrement support, or develop self-checking keys
to simplify error detection
 9.0 and 8.0.2 support automatic generation of universal unique
identifiers (UUIDs) as surrogate keys
 Compatible with Microsoft’s implementation
 New native domain: uniqueidentifier in 9.0.2
 No longer necessary to use string conversion functions such as
uuidtostr(); type conversion done automatically
 Tradeoff their characteristics with GLOBAL AUTOINCREMENT
68
Physical database design tips: foreign
keys
 Foreign keys are essential to the optimization of complex
queries
 Join selectivity and cardinality estimation is much more accurate
when foreign key constraints are present
 Also enable a variety of query rewrite optimizations
 But tradeoff using declarative referential integrity
 Downside is the maintenance cost for indexes that are not utilized
in query processing
 In rare situations, consider eliminating some RI and check
constraints once application is fully tested
69
Physical database design tips: triggers,
constraints
 Use declarative referential integrity instead of triggers
 Use CHECK constraints rather than triggers for simple
conditions
 9.0 supports named constraints
 Unnamed constraints are automatically named as ‘ASAnnn’
 Mark columns as NOT NULL when appropriate
 Don’t over-use CHECK constraints
 e.g. in user-defined data types
 Using a user-defined function in a CHECK constraint will
guarantee poor update performance
70
Server configuration tips: cache size
 Dynamic cache sizing is instituted by default on platforms that
support it
 Not supported for CE, Netware
 Can override dynamic cache sizing as necessary
 Server can dynamically adjust cache size depending on server workload;
this is more robust in 9.0.1
 Use –ch to specify an upper bound larger than 256MB
 If specifying cache size at startup:
 Need to allow for OS and application overhead
 CE has different defaults than other platforms
 Java-enabled databases require a larger minimum cache for the Java
VM - 8Mb usually sufficient
 Watch for NT File Cache competition
 See white paper on memory usage (available at
http://www.ianywhere.com/developer)
71
Data management in Jasper
Statements concerning iAnywhere Solutions' new products are
forward-looking statements that involve a number of uncertainties
and risks and cannot be guaranteed. Factors that could ultimately
affect such statements are detailed from time to time in Sybase's
Securities and Exchange Commission filings, including but not
limited to its annual report on Form 10-K and its quarterly reports on
Form 10-Q (copies of which can be viewed on the Company's
website).
----------------------------------------------------All of the information in this presentation are forward-looking
statements, as defined above. As such, there is uncertainty
associated with if or when any of these features will be added to the
product.
72
Data management changes in Jasper
 Default page size changed to 4K
 New catalog implementation




Catalog base tables have been renamed
All catalog access by applications is through views
Catalog base tables are reorganized, more efficient
View dependencies on base tables and views are now tracked
 Improved storage organization for BLOB columns
 In-row BLOB prefix default is no longer fixed at 254:
 CHAR/VARCHAR: minimum 8, maximum 128
 BINARY/VARBINARY: minimum 0, maximum 256
 can override on per-column basis
 New storage architecture for long values, permits efficient random access
73
View dependency tracking
 Three states for any view:
 Valid: compiled and active, can be utilized in queries
 Invalid: view has been invalidated by the server due to
dependency checking as a result of DDL on base tables
 Upon reference, the server will attempt to compile the view and use it
if possible
 Otherwise, query will get an error
 Disabled: view has been explicitly disabled (via new statement,
DISABLE VIEW), and is unusable
 View must be explicitly enabled in order to become valid (via new
statement, ENABLE VIEW)
74
View dependency tracking
Upon an ALTER (or DROP):
 Server attempts to acquire an exclusive lock on the object to be
modified
 Server honours the current setting of the BLOCKING option
 Server then acquires exclusive locks on all dependent views
 If any lock cannot be acquired, the statement gets an error
 Once locked, all dependent views are invalidated
 ALTER (or DROP) statement is executed
 With ALTER, the server attempts to revalidate all the previously
invalidated views
 Views successfully recompiled are marked as valid
 Otherwise, the view is left in the invalid state
 Server will attempt to recompile it when
 First referenced in a server session, or
 When other DDL is performed that may affect that view
75
Internationalization improvements
 Support for NCHAR data type
 NCHAR strings are stored as UTF-8
 NCHAR specification and functions use character semantics, not byte
semantics

NCHAR(10) means 10 characters (1-4 bytes per character)
 CHAR specification now supports either BYTE or CHAR modifier
 E.g. CHAR(10 BYTE) or CHAR(23 CHAR)
 NCHAR can support either
 UCA (Unicode Collation Algorithm) using IBM’s ICU library
 Properly supports multi-byte character sorting
 A legacy collation stored as UTF-8
 Database now can have two collations, one for NCHAR, one for CHAR
 Details in session SQL506 Monday afternoon
76
Indexing changes
 New index implementation
 Improved implementation of compressed B-tree indexes
 Key values are duplicated in the index to support index-only retrieval and
snapshot isolation
 Older “hash”-based indexes have been dropped entirely
 Index column order for primary keys now based on PK constraint
declaration, not column order in table
 PK can be altered, reordered without rewriting all the rows in the table
 Order specification can now be specified with any constraint index
 e.g. PRIMARY KEY (X ASC, Y DESC, Z ASC)
 Foreign key column order can now be different than that of PK
 All indexes now appear in the SYSINDEXES view
 Planned:
 Ability to declare that a FK is unique (to enforce a 1:1 relationship)
 Abstract indexes into logical and physical implementations
 Redundant indexes will not be created
77
Shareable global temporary tables
 Shared global temporary tables
 New syntax:
 CREATE GLOBAL TEMPORARY TABLE ….. SHARE BY ALL
 The contents of the table will persist until explicitly deleted or until
the database is shut down. On database startup, the table will be
empty.
 Row locking on shared temporary tables behaves the same as for
permanent tables
 Inserts, updates and deletes on shared temporary tables are not
recorded in the transaction log
 Column statistics are maintained in memory by the server.
78
Data management changes in Jasper
 Last modification time for any row in a table now retained in
SYSTABLE
 Resolution is one second
 LOAD TABLE enhancements: better performance, ENCODING
option, ROW DELIMITED BY option
 Apply multiple transaction logs at startup (can specify a directory)
 Better row-level locking implementation
 Elimination of key-range locking with anti-insert locks
 Planned: introduction of INTENT locks (e.g. FETCH FOR UPDATE)
 Improved administration of large databases:
 Parallel backup
 Auto-tuning to exploit multiple CPU’s on SMP hardware
 Faster unload/reload, index creation, database validation
79
Database mirroring
 Provides “hot” failover for a SQL Anywhere database
 Involves two or three separate servers: primary, mirror, arbiter
 Transaction log pages are passed from the primary server to the mirror to
keep the mirror up-to-date
 Mirror server is not accessible by any other connections
 Effectively the mirror server is in continuous recovery mode
 Log pages can be passed in three modes:
 Synchronously (default) on COMMIT
 Asynchronously on COMMIT – better performance than synchronous mode
 Asynchronously when log page is full, with a timeout option

Async implies the usual caveats with possible lost transactions
 Role switch occurs if primary server fails
 Arbiter used to verify the mirror state before role switch proceeds
 Clients are disconnected from the primary server
 Must reconnect to the mirror
 See Techwave session SQL508 – High Availability ASA on Wednesday
80
Snapshot isolation support
 Provides read-consistency in the face of concurrent writes from other
transactions (e.g. writers do not block readers)
 Enabled by a global database option, allow_snapshot_isolation
 Three new transaction isolation levels:
 “snapshot” – cleanest semantics, transaction sees a consistent view of
the database as of transaction start (the time the first row was accessed)
 “stmt-snapshot” – requires less resources, however each statement sees
a consistent state of the database but at different times
 Only one snapshot time exists for a connection; outermost or first statement
sets the transaction time
 “read-only-stmt-snapshot” – like stmt-snapshot, but only for queries;
update statements execute at isolation level 1
 Usage is not free
 Old copies of rows are maintained in a “row version store” (part of the
database’s temp file) for as long as necessary to ensure consistency for
any transaction
 Indexes have a mix of “old” and “current” values
 Can affect the performance of both sequential and index scans
81
Snapshot isolation support
 Setting the isolation level:
 set transaction isolation level snapshot
 set transaction isolation level statement snapshot
 set transaction isolation level read only statement snapshot
 Or within an ODBC application, use
 SA_SQL_TXN_SNAPSHOT
 SA_SQL_TXN_STATEMENT_SNAPSHOT
 SA_SQL_TXN_READ_ONLY_STATEMENT_SNAPSHOT
 Update conflicts are still possible
 Isolation levels can be mixed (but not recommended)
 Database property VersionStorePages contains the number of pages
in the temp file devoted to copies of old rows
 BLOB values do not reside in the temp file, but remain in the main
database file and are reference counted
 Some restrictions on DDL when snapshot transactions are in
progress (ALTER TABLE, etc.)
82
Lazy CHECKPOINTs
 A Jasper server can now initiate a CHECKPOINT and perform
other operations while it takes place.
 In previous releases, all database activity would stop while the
CHECKPOINT took place.
 There can only be one CHECKPOINT in progress at a time.
 If a CHECKPOINT is already in progress, then any operation like
an ALTER TABLE or CREATE INDEX that wants to initiate a new
CHECKPOINT needs to wait for the last one to finish.
 Lazy checkpoints are not used if using the –m option
 Documented by START CHECKPOINT and FINISH
CHECKPOINT records in the transaction log
83
Application profiling and request-level
logging
 Major enhancements in the Jasper release
 Unified logging architecture
 Can log data to a database, rather than a flat file
 Can log data to a different database, even on another server
 Much lower overhead
 Considerably greater detail in diagnostic information




Lock contention
Statements within stored procedures and triggers
Elapsed times
Query plans
 Planned improvements to DBCONSOLE for real-time server status
 Attend sessions SQL501/514 Tuesday afternoon at 1:30
 ASA Performance Analysis from Start to Finish
84
iAnywhere at TechWave 2005
Ask the iAnywhere Experts on the Technology Boardwalk (exhibit hall)
• Drop in during exhibit hall hours and have all your questions answered by our technical
experts!
• Appointments outside of exhibit hall hours are also available to speak one-on-one with
our Senior Engineers. Ask questions or get your yearly technical review – ask us for
details!
TechWave ToGo Channel
• TechWave ToGo, an AvantGo channel providing up-to-date information about TechWave
classes, events, maps and more –now available via your handheld device!
• www.ianywhere.com/techwavetogo
iAnywhere Developer Community - A one-stop source for technical information!
Access to newsgroups,new betas and code samples
• Monthly technical newsletters
• Technical whitepapers,tips and online product documentation
• Current webcast,class,conference and seminar listings
• Excellent resources for commonly asked questions
• All available express bug fixes and patches
• Network with thousands of industry experts
http://www.ianywhere.com/developer/
85
SQL Anywhere ‘Jasper’ Release
Learn more about 'Jasper', the upcoming SQL Anywhere release, loaded
with features focused on:
• Enhanced data management including performance, data protection, and
developer productivity
• Innovative data movement including manageability, flexibility and
performance, and messaging
Attend the following sessions:
SQL Anywhere 'Jasper' New Feature Overview
Session SQL512 will be held Monday, August 22nd, 1:30pm
MobiLink 'Jasper' New Feature Overview
Session SQL515 will be held Wednesday, August 24th, 1:30pm
... and remember to look for sneak peeks in other sessions and morning
education courses!
Register for the Jasper Beta program:
www.ianywhere.com/jasper
86
Questions
?
87