* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Benchmarking XML storage systems - Index of
Semantic Web wikipedia , lookup
Asynchronous I/O wikipedia , lookup
Business intelligence wikipedia , lookup
Operational transformation wikipedia , lookup
Search engine indexing wikipedia , lookup
Resource Description Framework wikipedia , lookup
Relational model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
National Information Exchange Model wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Benchmarking XML storage systems
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Agenda
 Project Overview
Motivation
 Goal of the Project
Benchmark Overview
 Results
11.02.08
RDBMS 1
Sedna
MonetDB
Benchmarking XML – Final Presentation
2
Motivation
 Traditional DBMS use relational data model
 Vendors extend their systems to process XML or
build new native stores
 XML processing is conceived to be slow
 Benchmarks for XML are just being developed
11.02.08
Benchmarking XML – Final Presentation
3
Goal of the Project
 Analyse and compare performance of different
systems to process XML
 Systems tested:
RDBMS1 – big player in the relational DBMS market,
extended their product with XML capabilities
 Sedna – free native XML DB designed to be a
universal system for a wide range of XML applications
 MonetDB – very fast compared to other XML-DBs, but
only supports a small part of the XQuery functions
11.02.08
Benchmarking XML – Final Presentation
4
Benchmark
 Benchmark used : TPC-X
 currently under development at ETH
 models an Amazon-like online store in XML
 complete database is one XML file
e.g.: users with history, products with comments
 complex queries that put stress on query engine
11.02.08
Benchmarking XML – Final Presentation
5
RDBMS1
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Impression of the System
 almost all queries
work with few changes
 update queries were
surprisingly easy to
adapt
11.02.08
Benchmarking XML – Final Presentation
7
Impression of the System (contd.)
 not supported:
11.02.08
type-switch
(limited schema
support)
user-defined functions
Benchmarking XML – Final Presentation
8
Current Performance
 datamining
about one order of
magnitude slower than
Sedna
 update and search
11.02.08
seem a bit faster (but
still slower than others)
Benchmarking XML – Final Presentation
9
Tuning possibilities
 any XPath expression
can be indexed
 Indexes seem to be
based on rows rather
than on trees
11.02.08
Benchmarking XML – Final Presentation
10
Issue with Indexing
 Indexes help only with
„split“-tables, but they
are slower in general
11.02.08
Benchmarking XML – Final Presentation
11
Issues
„When the only tool you
own is a hammer, every
problem begins to
resemble a nail.“
Abraham Maslow
11.02.08
Benchmarking XML – Final Presentation
12
Issues with Joins
 there is only
Nested-Loops-Join
 no use of index as
soon as a join is
needed
 joins for almost
anything
11.02.08
Benchmarking XML – Final Presentation
13
Summary
 almost anything works
(even the adapter for
XCheck!)
 everything is slow
11.02.08
Benchmarking XML – Final Presentation
14
Conclusion
 RDBMS1 is not suited
for TpcX-Benchmark
 XML storage as a
improvement for
relational data but not
as stand-alone system
11.02.08
Benchmarking XML – Final Presentation
15
Sedna
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Overview
 Free native XML Database
 No Schema support
 Bulk-Load (native XML data storage)
 Document Collections
 Indexing
 Full-Text indexing (dtSearch)
11.02.08
Benchmarking XML – Final Presentation
Impression
 Good Introduction Example
 Few Reference Material
 Active Development Team
11.02.08
Benchmarking XML – Final Presentation
XQuery Support
 Most of the queries worked with a few changes
 Not supported:
11.02.08
Schema Import
FLWR-Expression with Update-Statement
Benchmarking XML – Final Presentation
Indexing (value Indices)
 Based on B-Tree
 For Elements and Attribute Values
 Managing:
Create Index on Nodes by Keys
Query executer does not support indexes automatically
- use „index-scan“ function in XQuery
11.02.08
Benchmarking XML – Final Presentation
Indexing (cont.)
gainsPerMonth
100
1’000
Normal
0.36
27.53
With Indices
0.08
0.52
11.02.08
10’000
50’000
100’000
5.14
25.71
65.58
Benchmarking XML – Final Presentation
Indexing (Full-Text Indices)
 Sedna provides Full-Text Indices with dtSearch
 dtSearch: commercial text retrieval engine
11.02.08
No free download
Benchmarking XML – Final Presentation
Conclusion
 Easy to start with the system
 Few reference material
 Most of the queries work with a few changes
 Execution time grows exponentially with larger
dataset
 Value indices deliver better execution times
11.02.08
Benchmarking XML – Final Presentation
MonetDB
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Overview & impression of the system
 well documented installation / usage
 many xquery features not supported
 good performance
 xml schema support, but no noticed
performance or functionality effect
 no support for user defined indexing (”automatic
and self-tuning indexes”)
11.02.08
Benchmarking XML – Final Presentation
Architecture
 MonetDB: Open-source database system for
high-performance applications in data mining,
OLAP, XML Query, test and multimedia retrieval.
Provides the databse functionality using the MILinterface (MonetDB Interpreter Language).
 Pathfinder: XQuery compiler that translates
xquery expressions into relational algebra and
calls MIL functions.
11.02.08
Benchmarking XML – Final Presentation
XQuery support
… quite complete support for XQuery language…
monetdb.cwi.nl
Not supported functions:
 Date/Time functions (0/76)
 String functions (21/32) fn:contains, fn:tokenize
 Sequence functions (11/19) fn:insert-before
 …
11.02.08
Benchmarking XML – Final Presentation
XML data import
 pf:add-doc("url", "file", x%)
 need x > 0 for update queries
 -> need to adapt xcheck
 influence on performance not clear
11.02.08
Benchmarking XML – Final Presentation
Performance
...often achieves a 10fold raw speed
improvement for SQL
and XQuery over
competitor
RDBMSs...
monetdb.cwi.nl
11.02.08
Benchmarking XML – Final Presentation
Scalability
11.02.08
Benchmarking XML – Final Presentation
Conclusions
 Very fast, good for large documents and
expensive queries
 Small documents: no drawback compared to
other DBMSs
 Big problem: lack of function support
If xquery function support gets better, it’s probably
the database of our choice!
11.02.08
Benchmarking XML – Final Presentation
Project Summary
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Project Summary
 RDBMS1
slow but can process almost anything.
 XML as a feature.
 Sedna
quite fast, can process a reasonable part of XML.
 MonetDB
11.02.08
very fast, but only limited capabilities.
Benchmarking XML – Final Presentation