Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
TU/e eindhoven university of technology Exporting Databases in XML DTD A Conceptual and Generic Approach Philippe Thiran Computer Science Department Technische Universiteit Eindhoven The Netherlands /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • Current Situation – XML as the standard for publishing and exchanging data over the Web – Data recorded and maintained in existing Databases • Heterogeneous databases: different data models • Limitation of database models – Database schema incompleteness (implicit/hidden structures) – Explicit and implicit interconnections among entities Oracle V5 Model Order OderID Customer Date Total[0-1] Detail OderID Reference Quantity Amount Product Reference Label[0-1] UnitPrice Supplier (no primary and foreign keys) /faculty of mathematics and informatics Order OderID Customer Date Total[0-1] id: OderID Detail OderID Reference Quantity Amount id: Reference OderID ref: Reference ref: OderID Product Reference Label[0-1] UnitPrice Supplier[1-5] id: Reference TU/e eindhoven university of technology Exporting Databases in XML • Migrating existing databases to XML – Principle • XML description in DTD • Bottom-up Approach • Exploiting as much as possible the meaning of source data – Method and Tool • Method – Not limited to any specific database model – Capturing the explicit and implicit structures and interconnections of the database schema • Tool for supporting the method /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML Schema Representation Database models and DTD Schema Manipulation Database schemas and DTD /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • Schema Representation – Expressing database schemas and XML in terms of GER • • Extended object-entity relationship data model One rich and expressive model able to express data schemas whatever their operational data models – Operational database models like IMS, Relational, OO – XML-family models: XML DTD or XML Schema /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • Schema Representation – Expressing XML in terms of GER • DTD expressed in terms of GER – DTD concepts DTD Concepts – Hierarchical organization Element types – Sequence organization GER Interpretation Entity types Hierarchy of element types (root) entity types, relationship types, father roles Content type ELEMENT Relationship types Sequence organization (order of elements in the sequence) Seq groups Occurrence operators on sub-elements ?, *, + Role Cardinalities /faculty of mathematics and informatics IDREF, GID attributes IDREF, GID groups TU/e eindhoven university of technology Exporting Databases in XML • Schema Representation – Expressing XML in terms of GER <!ELEMENT Catalog (Order*, Product*)> <!ELEMENT Order (Customer, Date, Total?, detail+)> <!ATTLIST Order OrderID ID #REQUIRED> <!ELEMENT Customer ANY> <!ELEMENT Date (#PCDATA)> <!ELEMENT Total (#PCDATA)> <!ELEMENT Detail (Quantity, Amount)> <!ATTLIST Detail Product IDREF #REQUIRED> <!ELEMENT Quantity (#PCDATA)> <!ELEMENT Amount (#PCDATA)> <!ELEMENT Product (Supplier+)> <!ATTLIST Product Reference ID #REQUIRED Label CDATA #IMPLIED UnitPrice CDATA #REQUIRED> <!ELEMENT Supplier ANY> /faculty of mathematics and informatics Catalog seq: .Order[*] .Product[*] f 0-N f 0-N Order OderID seq: .Customer 1-1 .Date .Total .Detail[*] gid: OderID 1-1 f 1-1 f 1-1 f 0-1 f 1-N 1-1 1-1 Customer #any Date #pcdata 1-1 Total #pcdata 1-1 Product Reference Label[0-1] UnitPrice gid: Reference seq: .Supplier[*] Detail Product idref: Product seq: .Quantity .Amount f 1-1 f 1-1 Quantity 1-1 #pcdata 1-1 f 1-N 1-1 Supplier #any Amount #pcdata TU/e eindhoven university of technology Exporting Databases in XML • Schema Manipulation – Transforming XML DTD within GER • Schema transformations defined on GER – Reverse transformations, semantics-preserving transformations – Transformation operators • Standard transformations – For manipulating schemas expressed in operational database models – Example: transforming an entity type into an attribute • DTD-specific transformations /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • Schema Manipulation – Transforming XML DTD within GER • Standard transformations – For manipulating schemas expressed in classical structured models – Example of a semantics-preserving transformation: transforming an relationship type into a entity type RT-ET: Transforming a relationship type into an entity type. A A1 Inverse: ET-RT 0-N /faculty of mathematics and informatics B1 B1 R 0-N B1 B1 A A1 0-N rA R 1-1 id: rB.B1 1-1 rA.A 0-N rB TU/e eindhoven university of technology Exporting Databases in XML • Schema Manipulation – Transforming XML DTD within GER • DTD-specific transformations (example) – Suited to derive a DTD from a structured data DTD-RT-to-HIER: Transforms schema a one-to-many (or one-to-one) binary relationship type into a hierarchical relation. The 1-1 role becomes the child role. A 0-N R f 0-N f 0-1 R 1-1 B R1 1-1 A A f 0-N R 1-1 B 1-1 A Inverse: DTD-HIER-to-RT Create-SEQ-GROUP: Adds a seq group to an entity type. That group contains the child roles played by its children (in an aleatory order). Inverse: Del-SEQ-GROUP /faculty of mathematics and informatics R f seq: R1.A[*] 0-N R2.B f 0-1 R2 1-1 R1 R2 1-1 B B TU/e eindhoven university of technology Exporting Databases in XML Converting (legacy) databases into DTD Exploiting as much as possible the meaning of source data Capturing the explicit and implicit structures and interconnections /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • Exporting Databases – Bottom-up approach (from the source to the target) – Semi-automated 4-step method • Extraction of the database schema (automated) – Extraction of the explicit structures and constraints • Semantics recovering (semi-automated) – Recovery of the implicit structures and constraints • Model translation (semi-automated) – Translation of a schema expressed in the GER into a schema expressed in the GER DTD – Use of the relations among entities • DTD exportation (automated) – Generation of the DTD document /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • Exporting XML – Reverse Engineering • Recovering of the conceptual schema of an existing database – Augmentation of the knowledge about the data semantics – Database reverse engineering process (DB-MAIN) Database Schema – Elicitation of hidden structures and constraints Conceptual Schema Order OderID Customer Date Total[0-1] acc: OderID Detail OderID Reference Quantity Amount acc: Reference OderID Product Reference Label[0-1] UnitPrice Supplier acc: Reference Schema transformations FileCatalog Product Order Detail /faculty of mathematics and informatics Order OderID Customer Date Total[0-1] id: OderID 1-N Detail Quantity Amount 0-N Product Reference Label[0-1] UnitPrice Supplier[1-5] id: Reference TU/e eindhoven university of technology Exporting Databases in XML • Exporting XML – Model Translation • DTD-specific transformation • Non-deterministic process – It requires some design choices – The user-inputs might have consequences on the properties and the semantics of the resulting schema • 5-step transformation process – – – – – Schema preparation Hierarchy structure creation Constraint relaxation Attribute representation Ordering definition /faculty of mathematics and informatics eindhoven university of technology TU/e Exporting Databases in XML 1. 2. 3. 4. 5. • Exporting XML – Model Translation • Schema preparation Schema preparation Hierarchy structure creation Constraint relaxation Attribute representation Ordering definition – Removing invalid constructs » Multivalued/compound attributes » Complex relationship types Conceptual Schema Order OderID Customer Date Total[0-1] id: OderID 1-N Detail Quantity Amount 0-N Product Reference Label[0-1] UnitPrice Supplier[1-5] id: Reference Order OderID Customer Date Total[0-1] id: OderID Detail Quantity Amount id: of.Product consists.Order 1-1 1-N consists Product Reference Label[0-1] UnitPrice id: Reference 1-5 1-1 of 0-N supplied 1-1 /faculty of mathematics and informatics Supplier Supplier id: supplied.Product Supplier TU/e eindhoven university of technology Exporting Databases in XML • Exporting XML – Model Translation • Hierarchical structure creation 1. 2. 3. 4. 5. Entity types, relationship types are transformed into a tree • by electing natural roots (significant concepts) Catalog f • by resolving father conflicts 0-N f • by breaking cycles 0-N • by (eventually) adding a unique root Schema preparation Hierarchy structure creation Constraint relaxation Attribute representation Ordering definition Order OderID 1-1 Customer Date f Total[0-1] 1-N id: OderID 1-1 Order OderID Customer Date Total[0-1] id: OderID Detail Quantity Amount id: of.Product consists.Order 1-1 1-N consists Product Reference Label[0-1] UnitPrice id: Reference Product Reference Label[0-1] UnitPrice id: Reference f 1-5 1-5 1-1 of 0-N Detail Reference 1-1 Quantity Amount id: Reference .f ref: Reference supplied 1-1 Supplier Supplier id: supplied.Product Supplier /faculty of mathematics and informatics 1-1 Supplier Supplier id: .f Supplier TU/e eindhoven university of technology Exporting Databases in XML 1. 2. 3. 4. 5. • Exporting XML – Model Translation • Constraint relaxation – Role cardinalities extension – Gid and idref groups creationCatalog f0-N f 0-N Catalog f 0-N f 0-N Order OderID 1-1 Customer Date f Total[0-1] 1-N id: OderID 1-1 Product Reference Label[0-1] UnitPrice id: Reference Schema preparation Hierarchy structure creation Constraint relaxation Attribute representation Ordering definition Order OderID 1-1 Customer Date f Total[0-1] 1-N gid: OderID 1-1 Detail Reference 1-1 Quantity Amount id: Reference .f ref: Reference f 1-5 Product Reference Label[0-1] UnitPrice gid: Reference f 1-N Detail Reference 1-1 Quantity Amount id: Reference .f idref: Reference 1-1 1-1 Supplier Supplier id: .f Supplier /faculty of mathematics and informatics Supplier Supplier gid: .f Supplier TU/e eindhoven university of technology Exporting Databases in XML 1. 2. 3. 4. 5. • Exporting XML – Model Translation • Attribute representation • Ordering definition Catalog f seq: .Order[*] 0-N .Product[*] f 0-N Order OderID seq: .Customer 1-1 .Date .Total .Detail[*] gid: OderID Schema preparation Hierarchy structure creation Constraint relaxation Attribute representation Ordering definition 1-1 f 1-1 f 1-1 f 0-1 f 1-N 1-1 1-1 Customer #any Date #pcdata 1-1 Total #pcdata 1-1 Product Reference Label[0-1] UnitPrice gid: Reference seq: .Supplier[*] Detail Product idref: Product seq: .Quantity .Amount Quantity 1-1 #pcdata 1-1 f 1-N 1-1 Supplier #any /faculty of mathematics and informatics f 1-1 f 1-1 Amount #pcdata TU/e eindhoven university of technology Exporting Databases in XML CASE Support – DB-MAIN Model Expression Database models and DTD Model Translation DTD-specific transformation /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • CASE Support – DB-MAIN – Basic Features • Dedicated to database application engineering • Based on the GER • Includes transformation operators, reverse engineering processors and schema analysis tools • Extraction facilities (SQL, Codasyl, RPG, IMS, etc.) /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • CASE Support – *-to-DTD Transformation • DTD-Specific transformations • Assistant /faculty of mathematics and informatics TU/e eindhoven university of technology Exporting Databases in XML • Conclusions – Rich and expressive data model • Translating semantics of both database and XML models – Non-deterministic aspect of the model translation • The same database schema can lead to a large set of equivalent XML structures – CASE Support (application) • Automatic production of XML documents – that comply with the DTD that has been computed – based on the schema transformations used to convert the database schema in XML DTD /faculty of mathematics and informatics