Download forcing updates one row at a time.

Communicating with the Outside Overview Package several SQL statements within one call to the database server • Embedded procedural language (Transact SQL) with control flow facilities. • Use User Defined Functions (UDFs) when they select out a high number of records. • Do not use positioned updates, which update rows that are obtained as the result of a query, forcing updates one row at a time. Packing sqls into one call to db server create table Temp1 (parent varchar(200)) create table Temp2(parent varchar(200)) create table Ancestor (person varchar(200)) /∗ Temp2will hold the latest generation discovered. ∗/ INSERT INTO Temp1 SELECT parent FROM Parental WHERE child = ’Nicholas Bennet’; WHILE EXISTS(SELECT * FROM Temp1) BEGIN INSERT INTO Ancestor SELECT * FROM Temp1; INSERT INTO Temp2 SELECT * FROM Temp1; DELETE FROM Temp1; INSERT INTO Temp1 SELECT Parental.parent FROM Parental, Temp2 WHERE Parental.child = Temp2.parent; DELETE FROM Temp2; IF EXISTS ( SELECT * FROM Ancestor WHERE person = ’Carol Diane’ ) PRINT ’Carol Diane is an ancestor of Nicholas Bennet.’ ELSE PRINT ’Carol Diane is not an ancestor of Nicholas Bennet.’ T-SQL to determines whether Carol Diane is an ancestor ofNicholas Bennet. Assume a genealogical database containing at least the relation Parental(parent, child). User Defined Functions Throughput (queries/msec) 6 UDF 5 application function 4 3 2 1 0 20% 80% • Function computes the number of working days between two dates. • Function executed either on the database site (UDF) or on the application site • Applying the UDF yields good performances when it helps reduce significantly the amount of data sent back to the application. Throughput (queries/msec) Retrieve Needed Columns Only • Avoid transferring unnecessary data • Might prevent the use of a covering index, i.e. index-only scan • In the experiment the subset contains ¼ of the attributes. 1.75 1.5 1.25 all covered subset 1 0.75 0.5 0.25 0 no index index – Reducing the amount of data that crosses the application interface yields significant performance improvement. Retrieve Needed Rows Only • If the user is only viewing a small subset of a very large result set, it is best to – Only transfer that subset – Only compute that subset • In cases when users only want a ‘feel’ of the data, use TOP or FETCH FIRST to fetch just a few rows • Applications that allow the formulation of ad-hoc queries(An Ad Hoc Query is a query that cannot be determined prior to the moment the query is issued I ) should permit users to cancel them. Throughput (queries/sec) Minimize the Number of Query Compilations • Prepared execution yields better performance when the query is executed more than once: 0.6 0.4 – No compilation – No access to catalog. 0.2 direct prepared 0 0 1 2 3 4 5 6 Experiment performed on Oracle8iEE on Windows 2000. • Prepared execution plans become obsolete if indexes are added or the size of the relation changes. Direct: ODBC direct execution Prepared: ODBC prepared command, then execute it repeatedly . Tuning the Application Interface • Avoid user interaction within a transaction • Minimize the number of roundtrips between the application and the database • Retrieve needed columns only • Retrieve needed rows only • Minimize the number of query compilations Bulk Loading Data • Purpose: loading large volumes of data into a database • Tools – SQL server: bcp and a Transact-SQL command BULK INSERT. – Oracle: SQL ∗ Loader – DB2: Load utility • Tool parameters: – – – – – Bypass query engine Avoid logging No index update No constraint check Frequency of commits Use Direct Path for Bulk Loading sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl load data infile "lineitem.tbl" into table LINEITEM append fields terminated by '|' ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYYMM-DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT ) Direct Path Throughput (rec/sec) 50000 40000 30000 20000 10000 65 0 conventional direct path insert Experiment performed on Oracle8iEE on Windows 2000. • Loading 600000 records into the lineitem relation from TPCH • Direct path loading bypasses the query engine and the storage manager. It is orders of magnitude faster than conventional path (with a commit every 100 records) and inserts (with a commit for each record). Batch Size Throughput (records/sec) 5000 4000 3000 2000 1000 0 0 100000 200000 300000 400000 500000 Experiment performed on SQL Server 2000 on Windows 2000. 600000 • Bulk load of 600000 records. • Throughput increases steadily when the batch size increases to 100000 records.Throughput remains constant afterwards. • Trade-off between performance and amount of data that has to be reloaded in case of problem. • Bulk load of 600000 records. • As expected: 8000 6000 4000 Experiment performed on IBM DB2 UDB V7.1 on Windows 2000. incremental index maintenance statistics 0 non recoverable 2000 recoverable no statistics rebuild index THroughput (rec/sec) Storage Engine Parameters – Turning off logging helps. – Collecting statistics hurts – Maintaining indexes incrementally hurts a lot. Connecting to Multiple Databases • Shared connection to reduce start-up cost – Connection pooling • Pass-through statements when performance is CPU bound – E.g. dblink in Oracle – Eliminates query rewriting, translation and postprocesing to adapt to specific SQL dialect • Transfer large block of data to minimize roundtrips when performance is network bound, summary

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download forcing updates one row at a time.