* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download ppt
Survey
Document related concepts
Serializability wikipedia , lookup
Tandem Computers wikipedia , lookup
Microsoft Access wikipedia , lookup
Oracle Database wikipedia , lookup
Concurrency control wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Relational algebra wikipedia , lookup
Transcript
The MonetDB Architecture Martin Kersten CWI Amsterdam M.Kersten 2008 1 Try to keep things simple Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 2 MonetDB quickstep End-user application SQL JDBC ODBC XQuery Python Perl C-mapi lib MAPI protocol MonetDB kernel M.Kersten Sep 2008 PHP RoR The MonetDB Software Stack SQL/XML SOAP Open-GIS XQuery SQL 03 Optimizers MonetDB 4 X100 MonetDB 5 MonetDB kernel An advanced column-oriented DBMS M.Kersten Sep 2008 Compile? MonetDB quickstep SQL MAL The MonetDB Assembly Language glues the layers together. - binary relational model operators - imperative constructs - functional abstractions MAL MonetDB Kernel MonetDB Server Original: P. A. Boncz, M. L. Kersten. MIL Primitives for Querying a Fragmented World. The VLDB Journal, 8(2):101-119, October 1999. M.Kersten 2008 5 MonetDB quickstep select count(*) from photoobjall; function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X6:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",1); X9:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",2); X13:bat[:oid,:oid] := sql.bind_dbat("sys","photoobjall",1); X8 := algebra.kunion(X1,X6); X11 := algebra.kdifference(X8,X9); X12 := algebra.kunion(X11,X9); X14 := bat.reverse(X13); X15 := algebra.kdifference(X12,X14); X18 := algebra.markT(X15,0@0); X19 := bat.reverse(X18); X20 := aggr.count(X19); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,""); end s3_1; 0 base table delete 2 update 1 insert SQL MAL Tactical Optimizers MAL MonetDB Kernel MonetDB Server MonetDB quickstep Strategic optimizer: – Exploit the lanuage the language – Rely on heuristics SQL MAL Tactical MAL optimizer: –Modular optimizer framework –Focused on coarse grain resource optimization Operational optimizer: – Exploit everything you know at runtime – Re-organize if necessary M.Kersten 2008 MAL optimizers MAL MonetDB Kernel MonetDB Server 7 MonetDB quickstep select count(*) from photoobjall; function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X20 := aggr.count(X1); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,""); end s3_1; SQL MAL Optimizer pipelines. Tactical Optimizers sql> select optimizer; inline,remap,evaluate,costModel,coercions ,emptySet,aliases,mergetable,deadcode,c onstants,commonTerms,joinPath,deadcod e,reduce,garbageCollector,dataflow,history ,replication,multiplex MAL MonetDB Kernel MonetDB Server MonetDB quickstep select count(*) from photoobjall; function user.s3_1():void; X1:bat[:oid,:lng] := sql.bind("sys","photoobjall","objid",0); X20 := aggr.count(X1); sql.exportValue(1,"sys.","count_","int",32,0,6,X20,""); end s3_1; SQL MAL Kernel execution paradigms Tactical Optimizers Tuple-at-a-time pipelined Operator-at-a-time MAL MonetDB Kernel MonetDB Server Try to keep things simple MonetDB storage Database Structures PAX stores M.Kersten 2008 N-ary stores Column stores 10 Try to keep things simple Early 80s: tuple storage structures for PCs were simple OK John 32 Houston OK Mary 31 Houston Easy to access at the cost of wasted space M.Kersten 2008 11 Try to keep things simple Slotted pages Logical pages equated physical pages 32 John Houston 31 M.Kersten 2008 Mary Houston 12 Try to keep things simple Slotted pages Logical pages equated multiple physical pages 32 John Houston 31 M.Kersten 2008 Mary Houston 13 Avoid things you don’t always need Not all attributes are equally important M.Kersten 2008 14 Avoid moving too much around A column orientation is as simple and acts like an array Attributes of a tuple are correlated by offset M.Kersten 2008 15 Try to keep things simple • MonetDB Binary Association Tables ID 10 11 12 13 OID ID 100 101 102 103 104 M.Kersten 2008 OID 10 11 12 13 14 Day Discount 4/4/98 0.195 9/4/98 0.065 1/2/98 0.175 7/2/98 0 Day 100 4/4/98 101 9/4/98 102 1/2/98 103 7/2/98 104 1/2/99 OID Discount 100 0.195 101 0.065 102 0.175 103 0 104 0.065 16 Try to avoid doing things twice Physical data organization • Binary Association Tables head tail 100 Dense sequence 101 102 103 104 10 11 12 13 14 Memory mapped files Bat Unit fixed size M.Kersten 2008 17 Try to avoid doing things twice • Binary Association Tables accelerators OID ID 100 101 102 103 Hash-based access M.Kersten 2008 104 10 11 12 13 14 Column properties: key-ness non-null dense ordered 18 Try to avoid doing things twice • Binary Association Tables storage control OID 100 ID 100 101 102 103 Type remappings are used to squeeze space 104 Amsterdam Seattle New York London Paris ID Amsterdam Seattle New York London A BAT can be used as an encoding table A VID datatype can be used to represent dense enumerations M.Kersten 2008 19 Mantra: Try to keep things simple • Column orientation benefits datawarehousing • Brings a much tighter packaging and improves transport through the memory hierarchy • Each column can be more easily optimized for storage using compression schemes • Each column can be replicated for read-only access M.Kersten 2008 20 Try to maximize performance Volcano model Execution Paradigm Materialize All Model Vectorized model M.Kersten 2008 22 Volcano Engines Query SELECT name, salary*.19 AS tax FROM employee WHERE age > 25 M.Kersten 2008 23 Volcano Engines Operators Iterator interface -open() -next(): tuple -close() M.Kersten 2008 24 Volcano Engines Primitives Provide computational functionality All arithmetic allowed in expressions, e.g. multiplication mult(int,int) int M.Kersten 2008 25 Try to maximize performance Volcano paradigm • The Volcano model is based on a simple pullbased iterator model for programming relational operators. • The Volcano model minimizes the amount of intermediate store • The Volcano model is CPU intensive and can be inefficient M.Kersten 2008 26 Try to use simple a software pattern MonetDB paradigm • The MonetDB kernel is a programmable relational algebra machine • Relational operators operate on ‘array’-like structures M.Kersten 2008 27 Try to use simple a software pattern Operator implementation • All algebraic operators materialize their result • • • • GOOD: small code footprints GOOD: potential for re-use BAD : extra storage for intermediates BAD: cpu cost for retaining it • Local optimization decisions • • • • Sortedness, uniqueness, hash index Sampling to determine sizes Parallelism options Properties that affect the algorithms M.Kersten 2008 28 Try to use simple a software pattern Operator implementation • All algebraic operators materialize their result • Local optimization decisions • Heavy use of code expansion to reduce cost • • • • • 55 selection routines 149 unary operations 335 join/group operations 134 multi-join operations 72 aggregate operations M.Kersten 2008 29 Try to avoid the search space trap Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 30 Query optimization • Alternative ways of evaluating a given query • Equivalent expressions • Different algorithms for each operation (Chapter 13) • Cost difference between a good and a bad way of evaluating a query can be enormous • Example: performing a r X s followed by a selection r.A = s.B is much slower than performing a join on the same condition • Need to estimate the cost of operations • Depends critically on statistical information about relations which the database must maintain • Need to estimate statistics for intermediate results to compute cost of complex expressions Introduction (Cont.) Relations generated by two equivalent expressions have the same set of attributes and contain the same set of tuples, although their attributes may be ordered differently. Introduction (Cont.) • Generation of query-evaluation plans for an expression involves several steps: 1. Generating logically equivalent expressions • Use equivalence rules to transform an expression into an equivalent one. 2. Annotating resultant expressions to get alternative query plans 3. Choosing the cheapest plan based on estimated cost • The overall process is called cost based optimization. Equivalence Rules 1. Conjunctive selection operations can be deconstructed into a sequence of individual selections. ( E ) ( ( E )) 2. 2. Selection operations are commutative. 1 2 1 2 ( ( E )) ( ( E )) 1 2 2 1 3. Only the last in a sequence of projection operations is needed, the others can be omitted. t1 (t2 ((tn (E )))) t1 (E ) 4. Selections can be combined with Cartesian products and theta joins. a. b. (E1 X E2) = E1 E2 1(E1 2 E2) = E1 1 2 E2 Equivalence Rules (Cont.) 5. Theta-join operations (and natural joins) are commutative. E1 E2 = E2 E1 6. (a) Natural join operations are associative: (E1 E2) E3 = E1 (E2 E3) (b) Theta joins are associative in the following manner: (E1 1 E2) 2 3 E3 = E1 2 3 (E2 2 E3) where 2 involves attributes from only E2 and E3. Pictorial Depiction of Equivalence Rules Equivalence Rules (Cont.) 7. The selection operation distributes over the theta join operation under the following two conditions: (a) When all the attributes in 0 involve only the attributes of one of the expressions (E1) being joined. 0E1 E2) = (0(E1)) E2 (b) When 1 involves only the attributes of E1 and 2 involves only the attributes of E2. 1 E1 E2) = (1(E1)) ( (E2)) Equivalence Rules (Cont.) 8. The projections operation distributes over the theta join operation as follows: (a) if it involves only attributes from L1 L2: L1 L2 ( E1....... E2 ) ( L1 ( E1 ))...... ( L2 ( E2 )) (b) Consider a join E1 E2. • Let L1 and L2 be sets of attributes from E1 and E2, respectively. • Let L3 be attributes of E1 that are involved in join condition , but are not in L1 L2, and • let L4 be attributes of E2 that are involved in join condition , but are not in L1 L2. L1 L2 ( E1..... E2 ) L1 L2 (( L1 L3 ( E1 ))...... ( L2 L4 ( E2 ))) Equivalence Rules (Cont.) 9. The set operations union and intersection are commutative E 1 E2 = E2 E 1 E 1 E2 = E2 E 1 (set difference is not commutative). 10.Set union and intersection are associative. (E1 E2) E3 = E1 (E2 E3) (E1 E2) E3 = E1 (E2 E3) 11.The selection operation distributes over , and –. (E1 – E2) = (E1) – (E2) and similarly for and in place of – Also: (E1 – E2) = (E1) – E2 and similarly for in place of –, but not for 12. The projection operation distributes over union L(E1 E2) = (L(E1)) (L(E2)) Multiple Transformations (Cont.) Optimizer strategies • Heuristic • Apply the transformation rules in a specific order such that the cost converges to a minimum • Cost based • Simulated annealing • Randomized generation of candidate QEP • Problem, how to guarantee randomness Memoization Techniques • How to generate alternative Query Evaluation Plans? • Early generation systems centred around a tree representation of the plan • Hardwired tree rewriting rules are deployed to enumerate part of the space of possible QEP • For each alternative the total cost is determined • The best (alternatives) are retained for execution • Problems: very large space to explore, duplicate plans, local maxima, expensive query cost evaluation. • SQL Server optimizer contains about 300 rules to be deployed. Memoization Techniques • How to generate alternative Query Evaluation Plans? • Keep a memo of partial QEPs and their cost. • Use the heuristic rules to generate alternatives to built more complex QEPs • r1 r2 r3 r4 r4 r3 r2 r1 r1 r2 Level n plans Level 2 plans r3 r2 r3 r3 r4 x r1 r4 Level 1 plans Ditching the optimizers • Applications have different characteristics • Platforms have different characteristics • The actual state of computation is crucial • A generic all-encompassing optimizer costmodel does M.Kersten 2008 not work 44 Try to disambiguate decisions Code Inliner. Constant Expression Evaluator. Accumulator Evaluations. Strength Reduction. Common Term Optimizer. Join Path Optimizer. Ranges Propagation. Operator Cost Reduction. Foreign Key handling. Aggregate Groups. M.Kersten 2008 Code Parallizer. Replication Manager. Result Recycler. MAL Compiler. Dynamic Query Scheduler. Memo-based Execution. Vector Execution. Alias Removal. Dead Code Removal. Garbage Collector. 45 No data from persistent store to the memory trash Database Structures Execution Paradigm Query optimizer DBMS Architecture M.Kersten 2008 46 No data from persistent store to the memory trash Execution paradigms • The MonetDB kernel is set up to accommodate different execution engines • The MonetDB assembler program is • • • • Interpreted in the order presented Interpreted in a dataflow driven manner Compiled into a C program Vectorised processing • X100 project M.Kersten 2008 47 MonetDB/x100 X100 query engine Combine Volcano model with vector processing. All vectors together should fit the CPU cache CPU cache RAM Vectors are compressed ColumnBM (buffer manager) Optimizer should tune this, given the query characteristics. networked ColumnBM-s M.Kersten 2008 48 No data from persistent store to the memory trash • Varying the vector size on TPC-H query 1 mysql, oracle, low IPC, overhead db2 MonetDB X100 M.Kersten 2008 RAM bandwidth bound 49 Query evaluation strategy • Pipe-line query evaluation strategy • Called Volcano query processing model • Standard in commercial systems and MySQL • Basic algorithm: • Demand-driven evaluation of query tree. • Operators exchange data in units such as records • Each operator supports the following interfaces:– open, next, close • open() at top of tree results in cascade of opens down the tree. • An operator getting a next() call may recursively make next() calls from within to produce its next answer. • close() at top of tree results in cascade of close down the tree Query evaluation strategy • Pipe-line query evaluation strategy • Evaluation: • Oriented towards OLTP applications • Granule size of data interchange • Items produced one at a time • No temporary files • Choice of intermediate buffer size allocations • Query executed as one process • Generic interface, sufficient to add the iterator primitives for the new containers. • CPU intensive • Amenable to parallelization Query evaluation strategy • Materialized evaluation strategy • Used in MonetDB • Basic algorithm: • for each relational operator produce the complete intermediate result using materialized operands • Evaluation: • Oriented towards decision support queries • Limited internal administration and dependencies • Basis for multi-query optimization strategy • Memory intensive • Amendable for distributed/parallel processing Outline Database Structures Execution Paradigm Indexing M.Kersten 2008 53 Find a trusted fortune teller Materialized Views Indexing M.Kersten 2008 Cracking Traditional B-tree, Hash 54 Find a trusted fortune teller • Indices in database systems focus on: • All tuples are equally important for fast retrieval • There are ample resources to maintain indices • MonetDB does not rely on B-trees • MonetDB cracks the database into pieces M.Kersten 2008 55 Find a trusted fortune teller Database Cracking • Cracking in a database kernel? D1 • Every database scan is a physical reorganization • This is crazy… • Reorganization is utterly expensive… • Better to have many (update) users pay less then one (query) user a lot • It does not fit the Volcanostyle query processor.. • It just doesn’t work that way……. M.Kersten 2008 56 Find a trusted fortune teller • Cracking in a database kernel? • Every database scan is a physical reorganization D1 select * from D1 where ccd=‘j5’ M.Kersten 2008 57 Find a trusted fortune teller • Cracking in a database kernel? D1 • Every database scan is a physical reorganization D1 create view V as select * from D1 where ccd=‘j5’ select * from D1 where not ccd=‘j5’ M.Kersten 2008 58 Find a trusted fortune teller • Cracking in a database kernel? D1 • Every database scan is a physical reorganization D1 select * from D1 where ccd=‘j5’ select * from D1 where not ( ccd=‘j5’or ccd= ‘j7’) select * from D1 where ccd=‘j7’ M.Kersten 2008 60 Try to avoid useless investments • Cracking in a database kernel? D1 • Every database scan is a physical reorganization • Don’t pay the complete sort cost during update • Incremental predicate index maintenance • The first user pays ! M.Kersten 2008 61 Try to avoid useless investments A simple range query M.Kersten 2008 Try to avoid useless investments TPC-H query 6 M.Kersten 2008 Try to avoid useless investments • Cracking is easy in a column store and is part of the critical execution path • Cracking works under high volume updates M.Kersten 2008 64 Whoa MonetDB ! Speed lines ! PostgreSQL SQLserver Cstore MySQL DB2 M.Kersten 2008 65