Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
On the Estimation of Query Execution Time in Object Oriented Databases at the Early Design Stages Aleksey Burdakov, Yuri Grigorev (Bauman Moscow State Technical University, Russia) Andrey Poultenko (Amur State University, Russia) The Problem • Predict query execution time in OODBMS given early-stage design specifications – – – – database schema database objects (data) queries and transactions OODBMS, hardware and software platform • A hard problem – requires knowledge on OODB model, – … query execution algorithms and optimisation – … storage methods 17.08.02 ADBIS'2002 Bratislava Slovakia 2 Motivation • System architect evaluates alternatives on the following criteria: – cost, functional and non-functional requirements (performance, reliability, usability, etc.) • Performance – one of the most important criteria – hard to predict • Our contribution – methods for query execution time prediction – methods incorporated into decision support tool “ADAMDE” 17.08.02 ADBIS'2002 Bratislava Slovakia 3 Outline • Object Oriented Database Model • Evaluation Methods – Forward Join Algorithm – Reverse Join Algorithm – Page Structure and Physical I/O • Decision support tool “ADAMDE” 17.08.02 ADBIS'2002 Bratislava Slovakia 4 OODB Model • Features – Associative Relationships – Unique Object Identifiers <REFRS> 1 2 OID S <REFST > 1 2 S R K <REFST > 1 2 3 4 Query sample: 1 2 3 ... select from where N R.a, S.b R, S FR and FS – Class Inheritance Hierarchies – Collection Types, Functions, etc. 17.08.02 ADBIS'2002 Bratislava Slovakia 5 OODB Model: Class Hierarchy Logical Level Query R select S S OID S/ST RS RT ST OID OID type T/ST T OID OID (a) - horizontal partitioning 17.08.02 T ST OID Physical Representation (b) - vertical partitioning ADBIS'2002 Bratislava Slovakia select + join (b) - clustered storage 6 Evaluation Methods: Mathematical Apparatus • Parameters are Random Variables • Discrete: Generating Function (GF) – sum of independent random variables – mathematical expectation • Continuous: Laplace-Stieltjes Transform (LST) 17.08.02 ADBIS'2002 Bratislava Slovakia 7 Forward Join Algorithm • Algorithm (model) LOID POID <REFRS> 1 2 1 2 2 1 OID S 1 2 3 S R K K N • Query execution time - LST of join execution time - its mathematical expectation - number of references from R to S h j - probability of reference to j-th object of S from R w - physical pointer search time q - object access time 17.08.02 ADBIS'2002 Bratislava Slovakia 8 Forward Join Algorithm (Cont.) • Number of selected objects Mathematical expectation of the number of selected objects from S Degenerate case: hj = h, KS - rows selected from S, NR - rows selected from R • Estimation of result set size 17.08.02 ADBIS'2002 Bratislava Slovakia 9 Reverse Join Algorithm • Works in the opposite direction to the forward join algorithm • Estimation of query execution time • Estimation of result set size 17.08.02 ADBIS'2002 Bratislava Slovakia 10 Page Structure and Physical I/O • Problem: – object are stored on database pages and accessed randomly – parameters are random variables – determine physical input/output size • Solution: Classical combinatorial urn model empty ... 1 2 3 4 m – GF: x and y marks the number of occupied urns and balls 17.08.02 ADBIS'2002 Bratislava Slovakia 11 Page Structure and I/O (Cont.) • Number of accessed pages – V1(z) - GF of the number of pages occupied by an extent – V2(z) - GF of the number of accessed objects 17.08.02 ADBIS'2002 Bratislava Slovakia 12 Advanced Set of Tools for Analysis of Database Access Models in Distributed Environments ADAMDE - network topology - nodes configuration and characteristics (hard- and software) Network Node workload workload - database schema and statistics - queries and transactions Relational DBMS result set size Object-oriented DBMS Register System nodes, OS, DBMS, network characteristics utilisation - bottlenecks 17.08.02 - utilisation - response time ADBIS'2002 Bratislava Slovakia query result execution set size time query execution time - query execution time 13 Experiments • Used for analysis in the following cases: – – – – Cellular phone network (Beeline) Telephone billing system (WestCall) Internet shop (Ramenka) ERP Analytical subsystem (Vesko) • Accuracy – 10 - 40% for normally utilised systems – more for over-utilised systems (with bottlenecks) 17.08.02 ADBIS'2002 Bratislava Slovakia 14 Summary • Mathematical methods for query execution prediction: – execution time – result set size • Require only early-stage design specifications • Methods released in “ADAMDE” 17.08.02 ADBIS'2002 Bratislava Slovakia 15 Future Work • Complex predicates in “where” clause • New query execution algorithms • Particular forms of distribution functions (e.g. Gaussian distribution) 17.08.02 ADBIS'2002 Bratislava Slovakia 16 Thank you! Presentation materials are available at http://geocities.com/burdakov 17.08.02 ADBIS'2002 Bratislava Slovakia 17 Appendix 1: Degenerate Case: Yao’s Formulae Yao(m,b,n) lim Yao(m,b,n)=m 3 fU(n) 2.25 fL(n) b=2..5 1.5 0.75 0 0 17.08.02 2 4 6 ADBIS'2002 Bratislava Slovakia 8 10 18 n