Download On the Estimation of Query Execution Time in Object Oriented

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
On the Estimation of
Query Execution Time in
Object Oriented Databases
at the Early Design Stages
Aleksey Burdakov,
Yuri Grigorev
(Bauman Moscow State Technical University, Russia)
Andrey Poultenko
(Amur State University, Russia)
The Problem
• Predict query execution time in OODBMS
given early-stage design specifications
–
–
–
–
database schema
database objects (data)
queries and transactions
OODBMS, hardware and software platform
• A hard problem
– requires knowledge on OODB model,
– … query execution algorithms and optimisation
– … storage methods
17.08.02
ADBIS'2002 Bratislava Slovakia
2
Motivation
• System architect evaluates alternatives on
the following criteria:
– cost, functional and non-functional requirements
(performance, reliability, usability, etc.)
• Performance
– one of the most important criteria
– hard to predict
• Our contribution
– methods for query execution time prediction
– methods incorporated into decision support tool
“ADAMDE”
17.08.02
ADBIS'2002 Bratislava Slovakia
3
Outline
• Object Oriented Database Model
• Evaluation Methods
– Forward Join Algorithm
– Reverse Join Algorithm
– Page Structure and Physical I/O
• Decision support tool “ADAMDE”
17.08.02
ADBIS'2002 Bratislava Slovakia
4
OODB Model
• Features
– Associative Relationships
– Unique Object Identifiers
<REFRS>
1
2
OID S <REFST >
1
2
S
R
K
<REFST >
1
2
3
4
Query sample:
1
2
3
...
select
from
where
N
R.a, S.b
R, S
FR and FS
– Class Inheritance Hierarchies
– Collection Types, Functions, etc.
17.08.02
ADBIS'2002 Bratislava Slovakia
5
OODB Model: Class Hierarchy
Logical
Level
Query
R
select
S
S
OID
S/ST
RS
RT
ST
OID
OID
type
T/ST
T
OID
OID
(a) - horizontal
partitioning
17.08.02
T
ST
OID
Physical
Representation
(b) - vertical
partitioning
ADBIS'2002 Bratislava Slovakia
select
+
join
(b) - clustered
storage
6
Evaluation Methods:
Mathematical Apparatus
• Parameters are Random Variables
• Discrete: Generating Function (GF)
– sum of independent random variables
– mathematical expectation
• Continuous: Laplace-Stieltjes Transform (LST)
17.08.02
ADBIS'2002 Bratislava Slovakia
7
Forward Join Algorithm
• Algorithm (model)
LOID POID
<REFRS>
1
2
1
2
2
1
OID S
1
2
3
S
R
K
K
N
• Query execution time
- LST of join execution time
- its mathematical expectation
- number of references from R to S
h j - probability of reference to j-th object of S from R
w - physical pointer search time
q - object access time
17.08.02
ADBIS'2002 Bratislava Slovakia
8
Forward Join Algorithm (Cont.)
• Number of selected objects
Mathematical expectation of the number of
selected objects from S
Degenerate case: hj = h, KS - rows selected from S,
NR - rows selected from R
• Estimation of result set size
17.08.02
ADBIS'2002 Bratislava Slovakia
9
Reverse Join Algorithm
• Works in the opposite direction to the forward
join algorithm
• Estimation of query execution time
• Estimation of result set size
17.08.02
ADBIS'2002 Bratislava Slovakia
10
Page Structure and Physical I/O
• Problem:
– object are stored on database pages and accessed
randomly
– parameters are random variables
– determine physical input/output size
• Solution: Classical combinatorial urn model
empty
...
1
2
3
4
m
– GF: x and y marks the number of occupied urns
and balls
17.08.02
ADBIS'2002 Bratislava Slovakia
11
Page Structure and I/O (Cont.)
• Number of accessed pages
– V1(z) - GF of the number of pages occupied by an
extent
– V2(z) - GF of the number of accessed objects
17.08.02
ADBIS'2002 Bratislava Slovakia
12
Advanced Set of Tools for Analysis of Database
Access Models in Distributed Environments
ADAMDE
- network topology
- nodes configuration
and characteristics
(hard- and software)
Network
Node
workload
workload
- database schema and statistics
- queries and transactions
Relational DBMS
result set size
Object-oriented DBMS
Register
System
nodes, OS,
DBMS, network
characteristics
utilisation
- bottlenecks
17.08.02
- utilisation
- response time
ADBIS'2002 Bratislava Slovakia
query
result execution
set size time
query execution time
- query execution time
13
Experiments
• Used for analysis in the following cases:
–
–
–
–
Cellular phone network (Beeline)
Telephone billing system (WestCall)
Internet shop (Ramenka)
ERP Analytical subsystem (Vesko)
• Accuracy
– 10 - 40% for normally utilised systems
– more for over-utilised systems (with bottlenecks)
17.08.02
ADBIS'2002 Bratislava Slovakia
14
Summary
• Mathematical methods for query execution
prediction:
– execution time
– result set size
• Require only early-stage design
specifications
• Methods released in “ADAMDE”
17.08.02
ADBIS'2002 Bratislava Slovakia
15
Future Work
• Complex predicates in “where” clause
• New query execution algorithms
• Particular forms of distribution functions (e.g.
Gaussian distribution)
17.08.02
ADBIS'2002 Bratislava Slovakia
16
Thank you!
Presentation materials are available at
http://geocities.com/burdakov
17.08.02
ADBIS'2002 Bratislava Slovakia
17
Appendix 1:
Degenerate Case: Yao’s Formulae
Yao(m,b,n)
lim Yao(m,b,n)=m
3
fU(n)
2.25
fL(n)
b=2..5
1.5
0.75
0
0
17.08.02
2
4
6
ADBIS'2002 Bratislava Slovakia
8
10
18
n
Related documents