* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Paper
Survey
Document related concepts
Operational transformation wikipedia , lookup
Data center wikipedia , lookup
Data analysis wikipedia , lookup
Object storage wikipedia , lookup
Information privacy law wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
3D optical data storage wikipedia , lookup
Business intelligence wikipedia , lookup
Data vault modeling wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Clusterpoint wikipedia , lookup
Relational model wikipedia , lookup
Transcript
Lore Presentation paper Introduction By the early 90’s many people claimed that the traditional databases started facing some limitations: they force all data to adhere to an explicitly specified schema, which can be very annoying as data elements and Structures may change along the execution path of an application. This is what gave birth to the concept of the schema less and self-describing semi structured data. But at that time there was no DBMS for semi structured data, so the DB group of Stanford with research funding from DARPA, NASA and others, designed Lore; a DBMS designed specifically for managing semi structured information. This was a pioneer project, it was built entirely from scratch and was first released in 1995. In 1999 Lore went under some modifications in order to fully support XML. Next on this paper we would discover the general architecture of Lore, its data model and then its query language. Lore Lore Data Model OEM (Object Exchange Model) is the basic data model implemented in Lore. It is a simple, self-describing, nested object model for semi structured data. Data in this model can be thought of as a labeled directed graph in which the vertices are objects. Each object has a unique object identifier (oid), such as &5. Atomic objects have no outgoing edges and are types (such as int, real, string, gif, etc), while all other objects that have outgoing edges are called complex objects. As a summary, an OEM object has: Label: a character string, object aliases OID: Object unique identifier Type: Atomic (int, real, string), Complex Value: If it is a complex objectà list of OIDs If it is an atomic objectà atomic value of type int, real, string… The following example is a logical graph representation of an OEM database: Ppp Lore Query Language Lorel is Lore’s query language. It is based on OQL and it supports path expression for traversing the OEM graph data. We should notice that a path expression in Lorel is a name followed by a sequence of labels. For example in the figure showed befor, DBGroup.Member.Office would be the set of objects that can be reached starting with the DBGroup object, following edges labels member and then office. Range variables can be assigned to path expression, for example the variable X in “DBGroup.Member.Office X” would refer to the same thing mentioned earlier. Path expression are used directly in queries in an SQL style: select DBGroup.Member.Office where DBGroup.Member.Age > 30 The results for this query, according to Figure 1, would be Result: Office “Gates252” Office Building “CIS” Room “411” In fact the previous query is rewritten to OQL style: select O from DBGroup.Member M, M.Office O where exists y in M.Age : y > 30 Comparison on age transformed to existential condition: a user can ask DBGroup.Member.Age < 30 regardless of whether Age is single valued, set valued, or unknown. Lore’s architecture Lore architecture is divided into three main parts: API: A set of objects that would allow the user to access the Lore and perform either queries, or set up some changes in the core of the Lore machine. Query Compilation: In this Layer a query is first parsed, then it is transformed into an OQL like query. Based on this query a Query plan is generated using some query operators. This plan is then passed to the query optimizer which would seek for further improvements to add on the already made query plan. At the end the query plan is passed to the data Engine. Data Engine: This data engine is made of query operators module that executes the query plan passed to from the Query compilation layer. The object manageris the lower level interface that interacts with the physical storage storing or retrieving the OEM object into/from pages. Query Compilation At the stage of generating a query plan, this query would be used (see previous examples) and we end up with the figure 3 select O from DBGroup.Member M, M.Office O where exists y in M.Age : y > 30 Use recursive iterator approach when dealing with query plans: The execution begins at top of query plan, and each node in the plan requests a tuple at a time from its children and performs some operation on the tuple(s) and then passes the result tuples up to its parent. Tuples are a data structure called OA (Object Assignement). OA is a data structure containing slots for range variables with additional slots depending on the query. Each slot within an OA will holds the oid of a vertex on a path being considered by the query engine. We should end up at the end of a query with complete OAs As we can see in figure 3, the query plan uses query operators such as SCAN, PROJECT, JOIN. The Scan operator returns all oids that are sub-objects of a given object following a specified path expression: Scan (StartingOASlot, Path_expression, TargetOASlot) For each oid in StartingOASlot, check to see if object satisfies path_expression and place oid into TargetOASlot. For each returned OA of the left child, the join operator calls exhaustively the right child until no more OA is returned The aggregation operator (Aggr) adds to the target slot the result of the aggregation. The Join, Project and Select are almost identical to their corresponding relational operators Other operators are used such as: CreateSet, GoupBy, ArithOp. Once a query plan has been generated, it is passed to the query optimizer who does the following tasks: Push selection operators down the query plan tree. Eliminate/combine redundant query operators. Explores query plans that use indexes when possible. We should also notice that two kinds of indexes are used in Lore: Lindex (link index): returns all parents OIDs of a given OID via a label, and it is implemented as a hash table. Vindex (value index): returns all atomic objects of a label that satisfies a condition, and it is implemented as B+-trees. External Data External Data is a module that enables the retrieval of information from other data sources, transparent to the user, by using external object in the OEM graph representation of the database. An external object in Lore is a “placeholder” for the external data and specifies how lore interacts with an external data source. During query processing Scan operator notifies the external data manager whenever an external object is encountered. The specification for an external object includes: Location of a wrapper program to fetch and convert data to OEM, timeout interval a set of arguments used to limit info fetched from external source. Data Guides A DataGuide is a concise and accurate summary of the structure of an OEM database (stored as OEM database itself, kind of like the system catalog) in which each possible path expression is encoded once. It is very Helpful since: No explicit database schema, it is difficult to formulate meaningful queries Query processor may perform unnecessary work with no knowledge of the database structure. What if a path expression doesn’t exist (waste). DataGuides are dynamically generated and maintained over an existing database and they can store statistics such as the number of atomic objects of each type reachable by p. The following Data Guide is generated over the database of figure 1: Integrating XML into Lore Similar to an OEM, an XML element in Lore is a pair of < EID , VALUE > such that a EID: is a unique element identifier, and VALUE: is either an atomic string text or a complex value containing: A String value: tag à XML tag An ordered list of attribute-name/atomic-value An ordered list of cross link sub elements of the form <label,EID>, reachable via IDREF or IDREFS An ordered list of sub elements of the form <label,EID> We should know that comments are ignored. For the query language, they extended path expression to distinguish between sub elements and attributes, by using qualifiers: DBGroup.Member.>Name, use > to implicitly specify a sub element DBGroup.Member.@Name, use @ to implicitly specify an attribute DBGroup.Member.Name, when no @ or > qualifier is used, both attributes and sub elements are matched. For the Data Guide module, we can provide a DTD from which Lore builds the corresponding DataGuide Otherwise if no DTD is provided, a DataGuide is generated from the XML document. We may face some problems when updating: With a DTD is provided, validity is assured With no DTD, DataGuide is updated as the XML document is updated Conclusion Lore was originally developed for OEM data model since 1995, XML was integrated later in 1999. At that time Lore Provided a clear and robust solution for storing, querying, and updating semis tructured data (XML came after). The Lore project was declared pretty much out of business in 2000 by The Stanford Database Group, but it was the project that lead the way for XML to become a standard, and all DBMS that manipulate XML got their inspiration from Lore.