Download 14. Lorel

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operational transformation wikipedia , lookup

Data center wikipedia , lookup

Database wikipedia , lookup

Object storage wikipedia , lookup

Data analysis wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Data model wikipedia , lookup

SAP IQ wikipedia , lookup

Information privacy law wikipedia , lookup

Forecasting wikipedia , lookup

3D optical data storage wikipedia , lookup

Business intelligence wikipedia , lookup

Clusterpoint wikipedia , lookup

Data vault modeling wikipedia , lookup

Relational model wikipedia , lookup

Versant Object Database wikipedia , lookup

Database model wikipedia , lookup

Transcript
LORE
Light Object Repository
by
Othman Chhoul
CSC5370
Fall 2003
Outline
Introduction
What is Lore?
History
Lore’s Forensic
Conclusion
Questions
Demo
Introduction
Limitations faced by traditional Databases:
 force all data to adhere to an explicitly
specified schema
Data Elements may change
Structures may change along the execution
path of an application
 Head ache when it comes to decide on a
fixed schema for irregular or unstable data
SemiStructured Data
Widespread SemiStructured Data:
“Self-describing”
“Schemaless”
Examples:
Data from the web
Overall site structure may change often.
It would be nice to be able to query a web site.
Data integrated from multiple, heterogeneous
data sources.
Information sources change, or new sources added.
What is Lore?
Lore is a DBMS designed specifically
for managing semistructured
information, such as XML
Among the Pioneers in this domain
History
 Built, from scratch, by the DB Group at Stanford
University, with research funding from DARPA,
NASA and others.
 Introduced in 1995, with the first version of the
query language called Lorel, and used OEM as
data model.
 A lightweight system, because it was designed
for a single-user, read-only access.
 1999 - changed to support XML
Lore’s Forensic
Lore’s Data model
Lore’s Query Language
Lore’s General Architecture
When XML gets into action
OEM (Object Exchange Model)
 Simple, self-describing, nested object model for
semi structured data (XML???)
 Data in this model can be thought of as a
labeled directed graph
 Vertices in graph are objects.
Each object has a unique object identifier (oid),
such as &5.
Atomic objects have no outgoing edges and are
types such as int, real, string, gif, etc.
All other objects that have outgoing edges are
called complex objects.
OEM (Summary)
An OEM object has:
Label: a character string, object aliases
OID: Object unique identifier
Type: Atomic (int, real, string), Complex
Value: If it is a complex object list of OIDs
If it is an atomic object atomic value
of type int, real, string…
OEM (Example)
Lorel (Lore’s Query Language)
Lorel is an extension of OQL
Lorel supports path expressions for
traversing graph data
A simple path expression is a name
followed by a sequence of labels.
DBGroup.Member.Office: Set of objects that
can be reached starting with the DBGroup
object, following edges labels member and then
office.
Lorel
Range variables can be assigned to path
expression
Path expression are used directly in
queries in an SQL style:
select DBGroup.Member.Office
where DBGroup.Member.Age > 30
Lorel
Result:
Office “Gates252”
Office
Building “CIS”
Room “411”
Lorel (Behind the scenes)
Previous query rewritten to OQL style:
select O
from DBGroup.Member M, M.Office O
where exists y in M.Age : y > 30
Comparison on age transformed to
existential condition:
A user can ask DBGroup.Member.Age < 30
regardless of whether Age is single valued, set
valued, or unknown.
Lorel (More examples)
select DBGroup.Member.Name
where DBGroup.Member.Office(.Room%)?
like “%252”

 Result:
Name “Jones”
Name “Smith”
 Update: update P.Member +=( select DBGroup.Member where
DBGroup.Member.Name = "Clark" )
from DBGroup.Project P
where P.Title = "Lore" or P.Title = "Tsimmis"
Lore’s General Architecture
Lore’s General Architecture
Query and Update Processing
External Data
DataGuides
Query and Update Processing
Queries
Data Engine
(A Set of OEM objects)
Query Plan Generator
 select O
from
DBGroup.Mem
ber M,
M.Office O
where exists y
in M.Age : y >
30
Query Iterators
Use recursive iterator approach:
execution begins at top of query plan
each node in the plan requests a tuple at a time
from its children and performs some operation
on the tuple(s).
pass result tuples up to parent.
Tuples (Object Assignment)
 OA is a data structure containing slots for range
variables with additional slots depending on the query.
 Each slot within an OA will holds the oid of a vertex on a
path being considered by the query engine.
 We should end up at the end of a query with complete
OAs
Query Operators
The Scan operator returns all oids that are
sub-objects of a given object following a
specified path expression:
Scan (StartingOASlot, Path_expression, TargetOASlot)
For each oid in StartingOASlot, check to see if
object satisfies path_expression and place oid
into TargetOASlot.
For each returned OA of the left child, the
join operator calls exhaustively the right
child until no more OA is returned
Query Operators (cont)
The aggregation operator (Aggr) adds to
the target slot the result of the
aggregation.
The Join, Project and Select are almost
identical to their corresponding relational
operators
Other operators: CreateSet, GoupBy,
ArithOp
Query Operators (Visualize
the Words)
Query Operators (Visualize the
Words)
Query Optimizer
 Does only a few optimizations:
Push selection ops down query tree.
Eliminate/combine redundant query operators.
 Explores query plans that use indexes when
possible.
Two kinds of indexes:
Lindex (link index): returns all parents OIDs of a
given OID via a label, impl. as hashing.
Vindex (value index): returns all atomic objects of a
label that satisfies a condition, impl. as B+-trees
Vindexes
 Because of non-strict typing system, have String
Vindex, Real Vindex, and String-coerced-to-real
Vindex.
 Separate B-Trees of each type are constructed
for each label.
 Using Vindex for comparison
If type is string, do lookup in String Vindex
If can convert to real the do lookup in Stringcoerced-to-real Vindex.
If type is real or int, do almost the same thin
Vindexes (cont)
Arg2
Arg1
String
String
--
Real
Int
Stringreal Bothreal
Real
Stringreal
--
Int  real
int
Bothreal
Intreal
--
Index Query plans
If the user’s query contains a comparison
between a path expression and a value +
appropriate Vindex and Lindex exist
generate an index query plan
Previous query:
select O
from DBGroup.Member M, M.Office O
where exists y in M.Age : y > 30
Index Query plans (cont)
Update Query plans
update P.Member +=( select DBGroup.Member where
DBGroup.Member.Name = "Clark" )
from DBGroup.Project P
where P.Title = "Lore" or P.Title = "Tsimmis"
External Data
 Enables retrieval of
information from other
data sources, transparent
to the user.
 An external object in Lore
is a “placeholder” for the
external data and
specifies how lore
interacts with an external
data source.
External Data
 During query processing
Scan operator notifies the
external data manager
whenever an external
object is encountered
 The spec for an external
object includes:
Location of a wrapper
program to fetch and
convert data to OEM,
timeout interval
a set of arguments used
to limit info fetched from
external source.
DataGuides
 A DataGuide is a concise and accurate summary
of the structure of an OEM database (stored as
OEM database itself, kind of like the system
catalog).
 Very Helpful:
No explicit database schema  difficult to formulate
meaningful queries
Query processor may perform unnecessary work
with no knowledge of the database structure.
What if a path expression doesn’t exist (waste).
 Each possible path expression is encoded once.
DataGuides (cont)
DataGuides are dynamically generated
and maintained over an existing database
Can store statistics in DataGuide For
example, the # of atomic objects of each
type reachable by p.
DataGuides (example)
When XML gets into Action
Little reminder:
Lore first proposal in 1995
XML new standard for data representation and
data exchange over the WWW.
Public class XML_data extends
Semi_structured_data
Lore among the pioneers to integrate XML in
their DBMS architecture
From Semistructured Data to
XML
Data Model
Query Language
DataGuides
Changes in The Data Model
 Similar to an OEM, an XML element in Lore is a
pair of < EID , VALUE >
 EID: is a unique element identifier
 VALUE: is either an atomic string text or a
complex value containing:
A String value: tag  XML tag
An ordered list of attribute-name/atomic-value
An ordered list of crosslink subelements of the form
<label,EID>, reachable via IDREF or IDREFS
An ordered list of subelements of the form
<label,EID>
Changes in The Data Model (cont)
Comments are ignored
When an XML document is mapped into
this new data model, it can be seen as a
directed labeled graph
Example
Query Language
Extended path expression to distinguish
between subelements and attributes, by
using qualifiers:
DBGroup.Member.>Name &6, use > to
implicitly specify a subelement
DBGroup.Member.@Name  “Smith”, use @
to implicitly specify an attribute
 DBGroup.Member.Name &6 “Smith”, when
no @ or > qualifier is used, both attributes and
subelements are matched
DataGuides
Provide a DTD from which Lore builds the
corresponding DataGuide
Otherwise if no DTD is provided, a
DataGuide is generated from the XML
document
Problems when updating:
With a DTD is provided, validity is assured
With no DTD, DataGuide is updated as the XML
document is updated
Conclusion
Lore was originally developed for OEM
data model since 1995, XML was
integrated later in 1999
Lore Provided a clear and robust solution
for storing, querying, and updating
semistructured data (XML came after)
The Lore project was declared pretty much
out of business in 2000 by The Stanford
Database Group
Questions???????