Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Relational model wikipedia , lookup
Communicating with the Outside Overview Package several SQL statements within one call to the database server • Embedded procedural language (Transact SQL) with control flow facilities. • Use User Defined Functions (UDFs) when they select out a high number of records. • Do not use positioned updates, which update rows that are obtained as the result of a query, forcing updates one row at a time. Packing sqls into one call to db server create table Temp1 (parent varchar(200)) create table Temp2(parent varchar(200)) create table Ancestor (person varchar(200)) /∗ Temp2will hold the latest generation discovered. ∗/ INSERT INTO Temp1 SELECT parent FROM Parental WHERE child = ’Nicholas Bennet’; WHILE EXISTS(SELECT * FROM Temp1) BEGIN INSERT INTO Ancestor SELECT * FROM Temp1; INSERT INTO Temp2 SELECT * FROM Temp1; DELETE FROM Temp1; INSERT INTO Temp1 SELECT Parental.parent FROM Parental, Temp2 WHERE Parental.child = Temp2.parent; DELETE FROM Temp2; IF EXISTS ( SELECT * FROM Ancestor WHERE person = ’Carol Diane’ ) PRINT ’Carol Diane is an ancestor of Nicholas Bennet.’ ELSE PRINT ’Carol Diane is not an ancestor of Nicholas Bennet.’ T-SQL to determines whether Carol Diane is an ancestor ofNicholas Bennet. Assume a genealogical database containing at least the relation Parental(parent, child). User Defined Functions Throughput (queries/msec) 6 UDF 5 application function 4 3 2 1 0 20% 80% • Function computes the number of working days between two dates. • Function executed either on the database site (UDF) or on the application site • Applying the UDF yields good performances when it helps reduce significantly the amount of data sent back to the application. Throughput (queries/msec) Retrieve Needed Columns Only • Avoid transferring unnecessary data • Might prevent the use of a covering index, i.e. index-only scan • In the experiment the subset contains ¼ of the attributes. 1.75 1.5 1.25 all covered subset 1 0.75 0.5 0.25 0 no index index – Reducing the amount of data that crosses the application interface yields significant performance improvement. Retrieve Needed Rows Only • If the user is only viewing a small subset of a very large result set, it is best to – Only transfer that subset – Only compute that subset • In cases when users only want a ‘feel’ of the data, use TOP or FETCH FIRST to fetch just a few rows • Applications that allow the formulation of ad-hoc queries(An Ad Hoc Query is a query that cannot be determined prior to the moment the query is issued I ) should permit users to cancel them. Throughput (queries/sec) Minimize the Number of Query Compilations • Prepared execution yields better performance when the query is executed more than once: 0.6 0.4 – No compilation – No access to catalog. 0.2 direct prepared 0 0 1 2 3 4 5 6 Experiment performed on Oracle8iEE on Windows 2000. • Prepared execution plans become obsolete if indexes are added or the size of the relation changes. Direct: ODBC direct execution Prepared: ODBC prepared command, then execute it repeatedly . Tuning the Application Interface • Avoid user interaction within a transaction • Minimize the number of roundtrips between the application and the database • Retrieve needed columns only • Retrieve needed rows only • Minimize the number of query compilations Bulk Loading Data • Purpose: loading large volumes of data into a database • Tools – SQL server: bcp and a Transact-SQL command BULK INSERT. – Oracle: SQL ∗ Loader – DB2: Load utility • Tool parameters: – – – – – Bypass query engine Avoid logging No index update No constraint check Frequency of commits Use Direct Path for Bulk Loading sqlldr directpath=true control=load_lineitem.ctl data=E:\Data\lineitem.tbl load data infile "lineitem.tbl" into table LINEITEM append fields terminated by '|' ( L_ORDERKEY, L_PARTKEY, L_SUPPKEY, L_LINENUMBER, L_QUANTITY, L_EXTENDEDPRICE, L_DISCOUNT, L_TAX, L_RETURNFLAG, L_LINESTATUS, L_SHIPDATE DATE "YYYYMM-DD", L_COMMITDATE DATE "YYYY-MM-DD", L_RECEIPTDATE DATE "YYYY-MM-DD", L_SHIPINSTRUCT, L_SHIPMODE, L_COMMENT ) Direct Path Throughput (rec/sec) 50000 40000 30000 20000 10000 65 0 conventional direct path insert Experiment performed on Oracle8iEE on Windows 2000. • Loading 600000 records into the lineitem relation from TPCH • Direct path loading bypasses the query engine and the storage manager. It is orders of magnitude faster than conventional path (with a commit every 100 records) and inserts (with a commit for each record). Batch Size Throughput (records/sec) 5000 4000 3000 2000 1000 0 0 100000 200000 300000 400000 500000 Experiment performed on SQL Server 2000 on Windows 2000. 600000 • Bulk load of 600000 records. • Throughput increases steadily when the batch size increases to 100000 records.Throughput remains constant afterwards. • Trade-off between performance and amount of data that has to be reloaded in case of problem. • Bulk load of 600000 records. • As expected: 8000 6000 4000 Experiment performed on IBM DB2 UDB V7.1 on Windows 2000. incremental index maintenance statistics 0 non recoverable 2000 recoverable no statistics rebuild index THroughput (rec/sec) Storage Engine Parameters – Turning off logging helps. – Collecting statistics hurts – Maintaining indexes incrementally hurts a lot. Connecting to Multiple Databases • Shared connection to reduce start-up cost – Connection pooling • Pass-through statements when performance is CPU bound – E.g. dblink in Oracle – Eliminates query rewriting, translation and postprocesing to adapt to specific SQL dialect • Transfer large block of data to minimize roundtrips when performance is network bound, summary