Download Benchmarking XML storage systems - Index of

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Semantic Web wikipedia , lookup

Asynchronous I/O wikipedia , lookup

Business intelligence wikipedia , lookup

Operational transformation wikipedia , lookup

Search engine indexing wikipedia , lookup

Resource Description Framework wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

National Information Exchange Model wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

XML wikipedia , lookup

Transcript
Benchmarking XML storage systems
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Agenda
 Project Overview

Motivation
 Goal of the Project

Benchmark Overview
 Results



11.02.08
RDBMS 1
Sedna
MonetDB
Benchmarking XML – Final Presentation
2
Motivation
 Traditional DBMS use relational data model
 Vendors extend their systems to process XML or
build new native stores
 XML processing is conceived to be slow
 Benchmarks for XML are just being developed
11.02.08
Benchmarking XML – Final Presentation
3
Goal of the Project
 Analyse and compare performance of different
systems to process XML
 Systems tested:
RDBMS1 – big player in the relational DBMS market,
extended their product with XML capabilities
 Sedna – free native XML DB designed to be a
universal system for a wide range of XML applications
 MonetDB – very fast compared to other XML-DBs, but
only supports a small part of the XQuery functions

11.02.08
Benchmarking XML – Final Presentation
4
Benchmark
 Benchmark used : TPC-X
 currently under development at ETH
 models an Amazon-like online store in XML
 complete database is one XML file

e.g.: users with history, products with comments
 complex queries that put stress on query engine
11.02.08
Benchmarking XML – Final Presentation
5
RDBMS1
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Impression of the System
 almost all queries
work with few changes
 update queries were
surprisingly easy to
adapt
11.02.08
Benchmarking XML – Final Presentation
7
Impression of the System (contd.)
 not supported:


11.02.08
type-switch
(limited schema
support)
user-defined functions
Benchmarking XML – Final Presentation
8
Current Performance
 datamining

about one order of
magnitude slower than
Sedna
 update and search

11.02.08
seem a bit faster (but
still slower than others)
Benchmarking XML – Final Presentation
9
Tuning possibilities
 any XPath expression
can be indexed
 Indexes seem to be
based on rows rather
than on trees
11.02.08
Benchmarking XML – Final Presentation
10
Issue with Indexing
 Indexes help only with
„split“-tables, but they
are slower in general
11.02.08
Benchmarking XML – Final Presentation
11
Issues
„When the only tool you
own is a hammer, every
problem begins to
resemble a nail.“
Abraham Maslow
11.02.08
Benchmarking XML – Final Presentation
12
Issues with Joins
 there is only
Nested-Loops-Join
 no use of index as
soon as a join is
needed
 joins for almost
anything
11.02.08
Benchmarking XML – Final Presentation
13
Summary
 almost anything works
(even the adapter for
XCheck!)
 everything is slow
11.02.08
Benchmarking XML – Final Presentation
14
Conclusion
 RDBMS1 is not suited
for TpcX-Benchmark
 XML storage as a
improvement for
relational data but not
as stand-alone system
11.02.08
Benchmarking XML – Final Presentation
15
Sedna
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Overview
 Free native XML Database
 No Schema support
 Bulk-Load (native XML data storage)
 Document Collections
 Indexing
 Full-Text indexing (dtSearch)
11.02.08
Benchmarking XML – Final Presentation
Impression
 Good Introduction Example
 Few Reference Material
 Active Development Team
11.02.08
Benchmarking XML – Final Presentation
XQuery Support
 Most of the queries worked with a few changes
 Not supported:


11.02.08
Schema Import
FLWR-Expression with Update-Statement
Benchmarking XML – Final Presentation
Indexing (value Indices)
 Based on B-Tree
 For Elements and Attribute Values
 Managing:

Create Index on Nodes by Keys

Query executer does not support indexes automatically
- use „index-scan“ function in XQuery
11.02.08
Benchmarking XML – Final Presentation
Indexing (cont.)
gainsPerMonth
100
1’000
Normal
0.36
27.53
With Indices
0.08
0.52
11.02.08
10’000
50’000
100’000
5.14
25.71
65.58
Benchmarking XML – Final Presentation
Indexing (Full-Text Indices)
 Sedna provides Full-Text Indices with dtSearch
 dtSearch: commercial text retrieval engine

11.02.08
No free download
Benchmarking XML – Final Presentation
Conclusion
 Easy to start with the system
 Few reference material
 Most of the queries work with a few changes
 Execution time grows exponentially with larger
dataset
 Value indices deliver better execution times
11.02.08
Benchmarking XML – Final Presentation
MonetDB
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Overview & impression of the system
 well documented installation / usage
 many xquery features not supported
 good performance
 xml schema support, but no noticed
performance or functionality effect
 no support for user defined indexing (”automatic
and self-tuning indexes”)
11.02.08
Benchmarking XML – Final Presentation
Architecture
 MonetDB: Open-source database system for
high-performance applications in data mining,
OLAP, XML Query, test and multimedia retrieval.
Provides the databse functionality using the MILinterface (MonetDB Interpreter Language).
 Pathfinder: XQuery compiler that translates
xquery expressions into relational algebra and
calls MIL functions.
11.02.08
Benchmarking XML – Final Presentation
XQuery support
… quite complete support for XQuery language…
monetdb.cwi.nl
Not supported functions:
 Date/Time functions (0/76)
 String functions (21/32) fn:contains, fn:tokenize
 Sequence functions (11/19) fn:insert-before
 …
11.02.08
Benchmarking XML – Final Presentation
XML data import
 pf:add-doc("url", "file", x%)
 need x > 0 for update queries
 -> need to adapt xcheck
 influence on performance not clear
11.02.08
Benchmarking XML – Final Presentation
Performance
...often achieves a 10fold raw speed
improvement for SQL
and XQuery over
competitor
RDBMSs...
monetdb.cwi.nl
11.02.08
Benchmarking XML – Final Presentation
Scalability
11.02.08
Benchmarking XML – Final Presentation
Conclusions
 Very fast, good for large documents and
expensive queries
 Small documents: no drawback compared to
other DBMSs
 Big problem: lack of function support
If xquery function support gets better, it’s probably
the database of our choice!
11.02.08
Benchmarking XML – Final Presentation
Project Summary
Information Systems Lab HS 2007
Final Presentation
© ETH Zürich | Benchmarking XML
11.02.08
Project Summary
 RDBMS1

slow but can process almost anything.
 XML as a feature.
 Sedna

quite fast, can process a reasonable part of XML.
 MonetDB

11.02.08
very fast, but only limited capabilities.
Benchmarking XML – Final Presentation