Download Paper

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operational transformation wikipedia , lookup

SQL wikipedia , lookup

Data center wikipedia , lookup

Data analysis wikipedia , lookup

Object storage wikipedia , lookup

Information privacy law wikipedia , lookup

Data model wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

3D optical data storage wikipedia , lookup

SAP IQ wikipedia , lookup

Business intelligence wikipedia , lookup

Data vault modeling wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Transcript
Lore Presentation paper
Introduction
By the early 90’s many people claimed that the traditional databases started
facing some limitations: they force all data to adhere to an explicitly specified
schema, which can be very annoying as data elements and Structures may
change along the execution path of an application. This is what gave birth to
the concept of the schema less and self-describing semi structured data. But
at that time there was no DBMS for semi structured data, so the DB group of
Stanford with research funding from DARPA, NASA and others, designed
Lore; a DBMS designed specifically for managing semi structured information.
This was a pioneer project, it was built entirely from scratch and was first
released in 1995. In 1999 Lore went under some modifications in order to
fully support XML. Next on this paper we would discover the general
architecture of Lore, its data model and then its query language.
Lore
Lore Data Model
OEM (Object Exchange Model) is the basic data model implemented in Lore.
It is a simple, self-describing, nested object model for semi structured data.
Data in this model can be thought of as a labeled directed graph in which the
vertices are objects. Each object has a unique object identifier (oid), such
as &5. Atomic objects have no outgoing edges and are types (such as int,
real, string, gif, etc), while all other objects that have outgoing edges are
called complex objects. As a summary, an OEM object has:
 Label: a character string, object aliases
 OID: Object unique identifier
 Type: Atomic (int, real, string), Complex
 Value: If it is a complex objectà list of OIDs
If it is an atomic objectà atomic value of type int, real, string…
The following example is a logical graph representation of an OEM database:
Ppp
Lore Query Language
Lorel is Lore’s query language. It is based on OQL and it supports path
expression for traversing the OEM graph data. We should notice that a path
expression in Lorel is a name followed by a sequence of labels. For example
in the figure showed befor, DBGroup.Member.Office would be the set of
objects that can be reached starting with the DBGroup object, following edges
labels member and then office. Range variables can be assigned to path
expression, for example the variable X in “DBGroup.Member.Office X” would
refer to the same thing mentioned earlier. Path expression are used directly
in queries in an SQL style:
select DBGroup.Member.Office
where DBGroup.Member.Age > 30
The results for this query, according to Figure 1, would be Result:
Office “Gates252”
Office
Building “CIS”
Room “411”
In fact the previous query is rewritten to OQL style:
select O
from DBGroup.Member M, M.Office O
where exists y in M.Age : y > 30
Comparison on age transformed to existential condition: a user can ask
DBGroup.Member.Age < 30 regardless of whether Age is single valued, set
valued, or unknown.
Lore’s architecture
Lore architecture is divided into three main parts:
 API: A set of objects that would allow the user to access the Lore and perform
either queries, or set up some changes in the core of the Lore machine.
 Query Compilation: In this Layer a query is first parsed, then it is transformed
into an OQL like query. Based on this query a Query plan is generated using
some query operators. This plan is then passed to the query optimizer which
would seek for further improvements to add on the already made query plan.
At the end the query plan is passed to the data Engine.
 Data Engine: This data engine is made of query operators module that
executes the query plan passed to from the Query compilation layer. The
object manageris the lower level interface that interacts with the physical
storage storing or retrieving the OEM object into/from pages.
Query Compilation
At the stage of generating a query plan, this query would be used (see previous
examples) and we end up with the figure 3
select O
from DBGroup.Member M, M.Office O
where exists y in M.Age : y > 30
Use recursive iterator approach when dealing with query plans:
The execution begins at top of query plan, and each node in the plan requests
a tuple at a time from its children and performs some operation on the tuple(s)
and then passes the result tuples up to its parent.
Tuples are a data structure called OA (Object Assignement). OA is a data
structure containing slots for range variables with additional slots depending
on the query. Each slot within an OA will holds the oid of a vertex on a path
being considered by the query engine. We should end up at the end of a
query with complete OAs
As we can see in figure 3, the query plan uses query operators such as
SCAN, PROJECT, JOIN.
The Scan operator returns all oids that are sub-objects of a given object
following a specified path expression:
 Scan (StartingOASlot, Path_expression, TargetOASlot)
 For each oid in StartingOASlot, check to see if object satisfies
path_expression and place oid into TargetOASlot.
 For each returned OA of the left child, the join operator calls
exhaustively the right child until no more OA is returned
The aggregation operator (Aggr) adds to the target slot the result of the
aggregation.
The Join, Project and Select are almost identical to their corresponding
relational operators
Other operators are used such as: CreateSet, GoupBy, ArithOp.
Once a query plan has been generated, it is passed to the query optimizer
who does the following tasks:
 Push selection operators down the query plan tree.
 Eliminate/combine redundant query operators.
 Explores query plans that use indexes when possible.
We should also notice that two kinds of indexes are used in Lore:
 Lindex (link index): returns all parents OIDs of a given OID via a label,
and it is implemented as a hash table.
 Vindex (value index): returns all atomic objects of a label that satisfies
a condition, and it is implemented as B+-trees.
External Data
External Data is a module that enables the retrieval of information from other
data sources, transparent to the user, by using external object in the OEM
graph representation of the database. An external object in Lore is a
“placeholder” for the external data and specifies how lore interacts with an
external data source.
During query processing Scan operator notifies the external data manager
whenever an external object is encountered. The specification for an external
object includes:
 Location of a wrapper program to fetch and convert data to OEM,
 timeout interval
 a set of arguments used to limit info fetched from external source.
Data Guides
A DataGuide is a concise and accurate summary of the structure of an OEM
database (stored as OEM database itself, kind of like the system catalog) in
which each possible path expression is encoded once. It is very Helpful
since:
 No explicit database schema, it is difficult to formulate meaningful
queries
 Query processor may perform unnecessary work with no knowledge of
the database structure.
 What if a path expression doesn’t exist (waste).
DataGuides are dynamically generated and maintained over an existing
database and they can store statistics such as the number of atomic objects
of each type reachable by p. The following Data Guide is generated over the
database of figure 1:
Integrating XML into Lore
Similar to an OEM, an XML element in Lore is a pair of < EID , VALUE >
such that a EID: is a unique element identifier, and VALUE: is either an atomic
string text or a complex value containing:
 A String value: tag à XML tag
 An ordered list of attribute-name/atomic-value
 An ordered list of cross link sub elements of the form <label,EID>,
reachable via IDREF or IDREFS
 An ordered list of sub elements of the form <label,EID>
We should know that comments are ignored.
For the query language, they extended path expression to distinguish
between sub elements and attributes, by using qualifiers:
 DBGroup.Member.>Name, use > to implicitly specify a sub element
 DBGroup.Member.@Name, use @ to implicitly specify an attribute
 DBGroup.Member.Name, when no @ or > qualifier is used, both
attributes and sub elements are matched.
For the Data Guide module, we can provide a DTD from which Lore builds the
corresponding DataGuide Otherwise if no DTD is provided, a DataGuide is generated
from the XML document. We may face some problems when updating:
 With a DTD is provided, validity is assured
 With no DTD, DataGuide is updated as the XML document is updated
Conclusion
Lore was originally developed for OEM data model since 1995, XML was
integrated later in 1999. At that time Lore Provided a clear and robust
solution for storing, querying, and updating semis tructured data (XML came
after). The Lore project was declared pretty much out of business in 2000 by
The Stanford Database Group, but it was the project that lead the way for
XML to become a standard, and all DBMS that manipulate XML got their
inspiration from Lore.