* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download XML In An RDBMS World
Survey
Document related concepts
Oracle Database wikipedia , lookup
Relational algebra wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Clusterpoint wikipedia , lookup
Transcript
XML In An RDBMS World Michael D. Thomas [email protected] Data, data, data In general, computing is never far from data XML, RDBMSes deal with data – play in the same very large playground Extensive overlap between RDBMSes and XML Goal: Learn how to choose between RDBMS vs. XML, how to integrate the two technologies First, some pragmatism… “When all you have is a hammer, everything looks like a nail.” What if all you have is a hammer? “When all you have is a hammer, everything must look like a nail.” Best to have XML and RDBMS in your toolset Architectural purity is often impossible But we should know when & what we are compromising Tonight’s Agenda Compare XML and SQL RDBMS Technologies Understand Their Sweet Spots Examine Their Use Through Studies of Anti-Patterns and Patterns Of Usage Discuss XQuery and XML Storage In The RDBMS Topics Defining The Playground – XML vs. RDBMS Anti-Pattern: XML As Transactional Data Store Data Stores & The ACID Test Anti-Pattern: Document Storage & The RDBMS XQuery & RDBMS Doc Storage Anti-Pattern: XML Declarative Languages & Gratuitous Encapsulation Of SQL Understanding Declarative Languages, Such As SQL Anti-Pattern: XML Meta-data & The Overly Abstract Database Managing The Split Between XML-based & RDBMSbased Data Models In The Same Application XML As Database Anti-Pattern “I’ll use an XML file as a database” Advantages: Simple, Cheap An XML Schema defines the metadata of the database Data is right on the filesystem, not “hidden” in a complicated SQL RDBMS XML As Database Problems Updates are hard Have to re-invent concurrency handling – multiple users changing data at the same time No constraint checking No optimization, such as indexes, caching, etc. Organization of Data Been formalizing the storage information for a long time Age of computing called for new approaches Relational model highly successful XML is relatively new Good for loosely structured data – documents -- and data transmission Transmission & Interoperability Electronic transmission is newer than electronic storage With the growth of the Internet, transmission of data is exploding No Interoperability problems in 1950 – lots of interoperability problems now XML is an important standard for transmission and interoperability Vocabulary Datastore – Anything that stores data SQL Relational Database – A database that organizes info in tables and adheres to relational theory XML – 1) eXtensible Markup Language 2) The XML standard 3) All or part of an XML document Vocabulary XML Database – A database specialized for the storage of XML documents Object Database – A database that stores objects Object-Relational Database – A relational database with an extensible type system Data Centric vs. Document Centric Data Centric approach – the datastore is focused on handling highly structured, fine grained data. Favors the relational model Document centric approach – the datastore is focused on handling semistructured data, such as web pages, books, etc. Favors XML Data Centric vs. Document Centric Best queries are largely a result of structuring the data well (Messy desk vs. organized file cabinet) Documents are semi-structured data with ad hoc structures The overhead of defining rigorous structure for each type of document increases the overall cost of management (Would still give you the best queries) Storage vs. Transmission Data Storage and Data Transmission are two different concepts XML is very strong for interoperable transmission Can store by writing XML to a file You can ‘transmit’ relational data by exporting a few tables and ftping, but isn’t a strong solution Datastore Application Stack Application Setup Metadata Definition Data Persistence Datastore Query XML/Relational Comparison Relational/SQL XML Metadata Definition Create Table… Data Persistence Insert … Update … Select … Define XML Schema (Optional) Create XML Document XPath, XQuery Data Query Application With XML Transmissions (X)HTML XSLT Style sheet CSV XML XSLT Style sheet XSLT Style sheet XML XSLT Processor XML Application XML Web Service Producer Datastore XML Web Service Consumer Three Data Models Persistent Model – how data is stored Active Model (Object Model) – how data is arranged when it is being manipulated by a program, usually written in an imperative language Presentation/Transmission Model – how data is transmitted, usually as XML Datastore Basics Any Datastore must tackle the following issues: Concurrency Transactions – the ACID test Locking Joins Normalization Administration Issues Concurrency Datastores support concurrency if multiple users can access the same datastore at the same time Datastores must not allow the same data to be modified at the same time Transactions Atomicity – No matter how complex, a transaction is atomic and indivisible. Transactions are “all or nothing.” Consistency – Transactions must leave the database in a consistent state, i.e., consistent with the rules Isolation – Transaction is isolated from other transactions Durability – The effects of a transaction persist Isolation levels TRANSACTION_READ_UNCOMMITTED – Dirty reads, non-repeatable reads, phantom reads TRANSACTION_READ_COMMITTED – Non-repeatable reads, phantom reads TRANSACTION_REPEATABLE_READ – phantom reads TRANSACTION_SERIALIZABLE Locking Data must be locked for transactions to be isolated Locking is both a datastore and an application concern How much extraneous data is locked? (Page level locking, document level locking) Pessimistic locking: prevents reading of locked data Optimistic locking: generates an error when inappropriate data updates are attempted Joins A Join joins the data between two different data entities E.g., SELECT * from emp, dept WHERE emp.deptno = dept.deptno Joins are the cornerstone of SQL XPath doesn’t do joins between XML documents! XQuery, others can Normalization Normalization – organizing data to minimize redundancy Normalized data is easier to maintain and easier to understand conceptually In relational db design, normalized DBs need to be denormalized for performance reasons XML docs can also be normalized, but not as much support for tying elements together Normalization is important when designing, less crucial at implementation Administrative Issues Includes backup & recovery, installation, upgrades, optimization, maintenance, etc. In general: Bigger is better (economies of scale, 24x7 support) More popular, more standard is better (law of increasing returns) Whatever is already working in your organization is better (no need to hire, re-train administrators) Existing DB vendors have a huge advantage Datastores Different types: Relational Database – highly structured data such as account balances, inventory quantities, etc.) Document Database – used to store documents, probably in XML format The same DB can serve as both, e.g., Oracle Overview Of Relational Databases The Relational Model SQL Entity Relationship Diagrams Relational Model -- structure Data is grouped in tables Tables have columns and rows Columns are fairly fixed – the set of columns shouldn’t change much (if at all) over the life of a table A table can have any number of rows Rows change constantly Relational model – primary key In general, a table should have one or more columns defined as the primary key The primary key is unique and non-null Usually only one column Can consist of more than one column (composite primary key) Relational model – foreign key A foreign key describes a relationship between two tables The foreign key column of tableA points to a column in tableB TableB is said to be the parent of tableA Often, the foreign key points at a primary key SQL Three types: Query SELECT * FROM emp WHERE deptno=10 ORDER BY ename Data Modification Language (DML): Update, Insert Data Definition Language (DDL): Create Table Joins A Join is a Cartesian Cross-Product: SELECT count(*) FROM emp; SELECT count(*) FROM emp; SELECT count(*) FROM emp, dept; SELECT ename, emp.deptno, dept.deptno, dname FROM emp, dept; SELECT ename, emp.deptno, dept.deptno, dname FROM emp, dept WHERE emp.deptno = dept.deptno; Joins SELECT ename, dname FROM emp, dept WHERE emp.deptno = dept.deptno Entity Relationship Diagrams Used to create a map of your data Describes tables, attributes of tables and relationships between tables Come at it from two directions: lay out the entities, assign the attributes; group the attributes into entities XML vs. SQL Tables XML – order matters! (Rows in a relational table are unordered) XML is a tree structure XML documents tend to be semi-structured, SQL tables are highly structured SQL tables aren’t as flexible or interchangable XML joins aren’t straightforward An XML document/element doesn’t serve multiple purposes as well as DB schema/table Anti-Pattern: XML Document Storage As A SQL BLOB Need to store XML in a database Better than storing in a filesystem Make a BLOB (Binary Large Object) and store a document as an element in a row of a table Problem: can’t query contents of document without extracting it from the db! For simple queries, might have to extract all of the documents into the application – very inefficient Anti-Pattern: One-off SQL Schema For A Particular XML Document BLOB storage is bad, so why not “shred” the document across multiple DB tables? I.e., for XML elements named “Dept” make a “Dept” table, “Emp” elements make a “Emp” table, describe the hierarchy with foreign key constraints Query performance is much, much better – probably better than XPath against XML as a file Problems A lot of work! Have to do this for every XML schema Hard! XML schemas are inherently more flexible than SQL schemas. Some mappings can be difficult Negates flexibility of XML Not as learnable – Programmers have to learn to query your SQL schema, not just XPath and XML Schema Anti-Pattern: Developing A General “Shredding” Solution For All Of XML Your Application has several XML schemas, and it’s time-consuming to develop “shredding” for all of them So, you try to do a more generalized shredding Is possible, but is a very horizontal problem. You probably won’t get the time to solve it completely. Still might present learnability problems XML Datastore Architectures XML Views Of Relational Data Relational Wrappers Of XML Independent Storage of XML Documents (Native XML Database) Text Storage Of XML In RDBMS XML Shredding Across Relational Tables Storing XML as Objects in Object- Relational DBs (Oracle XMLType) Everything is XML (XQuery approach) Vocabulary: XML Collections XML Collection is a collection of XML documents “A row is to a table as an XML document is to an XML collection.” XML Derived From RDBMS Data exists naturally as relational data Needs to be represented as an XML document for some reason The derived XML is usually used for transmission Independent XML Store A specialized database is used to store XML documents Typically, an application will either have two datastores – relational and XML – or relational data will be stored as XML In the dual datastore case, the application code has to join the different data sets Text Storage Of XML In Databases XML is stored in a column of a relational table Allows you to mix structured and semistructured approaches However, hard to query against the XML directly XML Shredding Across Tables An XML document is stored across many tables Is possible to use SQL queries against the document’s parts Vendors can implement an XPath-toSQL translator Can structure the tables based on a schema A pain to handle yourself Exercise Shred an XML schema across relational tables XMLType Object-Relational databases allow you to define your own types You could define an Address object, define functions for the object, and store instances of the object in a column in the database Oracle defines an object-relational type, XMLType, for the storage of XML documents in the database SELECT e.poDoc.getClobval() AS poXML FROM po_xml_tab e WHERE e.poDoc.existsNode('/PO[PNAME = "po_2"]') = 1; Shredding is managed & encapsulated for you Oracle 9i: XMLType, Text Everything Is XML (XQuery) With XMLType, Oracle says that everything fits in to the ObjectRelational realm XQuery says that everything can be represented and queried as XML Relational data is derived from RDBMS using SQL/XML XQuery SQL/XML select xmlelement("emp", 'Employee ' , xmlelement( "name", e.job || ' ' || e.ename), ' was hired on ', xmlelement("hiredate", e.hiredate)) as result from emp e; ------------------------------------------------------------------------------ <emp>Employee <name>CLERK SMITH</name> was hired on <hiredate>17-DEC-80</hiredate> </emp> Query Soup SQL – queries relational data XPath – queries a particular XML document SQL/XML – a standard for deriving XML from relational data XQuery – queries a collection of XML documents and uses XPath to query particular XML documents Query Soup There are other XML query languages, such as XML-QL and Quilt XSLT is a transformation, not a query, language XSQL & MS XML/SQL are wrapper languages Anti-Pattern: Encapsulating SQL With XML General Rule: “Encapsulation is good, but encapsulating good things is bad.” SQL is good (Also, HTML is good) Why hide SQL with XML? XML encapsulating SQL SELECT emp.ename, dept.dname FROM emp, dept WHERE emp.deptno = dept.deptno <sql-query> <fields> <field> ename </field> <field> dname </field> </fields> <tables> <table> emp </table> <table> dept </table> </tables> <joins> <join>emp.deptno = dept.deptno </join> </joins> </sql-query> Same concept in XQueryX “XQueryX is an XML representation of an XQuery… The result is not particularly convenient for humans to read and write, but it is easy for programs to parse…” XQuery Example XQuery: { for $b in doc("http://www.bn.com/bib.xml")/b ib/book where $b/publisher = "AddisonWesley" and $b/@year > 1991 return <book year = "{ $b/@year }"> { $b/title } </book> } XQueryX Version <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="xqueryx.xsl"?> <xqx:module xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xqx="http://www.w3.org/2003/12/XQueryX" xsi:schemaLocation="http://www.w3.org/2003/12/XQueryX xqueryx.xsd"> <xqx:mainModule> <xqx:queryBody> <xqx:expr xsi:type="xqx:elementConstructor"> <xqx:tagName>bib</xqx:tagName> <xqx:elementContent> <xqx:expr xsi:type="xqx:flwrExpr"> <xqx:forClause> <xqx:forClauseItem> <xqx:typedVariableBinding> <xqx:varName>b</xqx:varName> </xqx:typedVariableBinding> <xqx:forExpr> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:functionCallExpr"> <xqx:functionName>document</xqx:functionName> <xqx:parameters> <xqx:expr xsi:type="xqx:stringConstantExpr"> <xqx:value>bib.xml</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>bib</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>book</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> </xqx:forExpr> </xqx:forClauseItem> </xqx:forClause> <xqx:whereClause> <xqx:expr xsi:type="xqx:operatorExpr" xqx:infix="true"> <xqx:opType>AND</xqx:opType> <xqx:parameters> <xqx:expr xsi:type="xqx:operatorExpr" xqx:infix="true"> <xqx:opType>=</xqx:opType> <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>publisher</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> <xqx:expr xsi:type="xqx:stringConstantExpr"> <xqx:value>Addison-Wesley</xqx:value> \ XQueryX Version (cont.) <xqx:QName>publisher</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> <xqx:expr xsi:type="xqx:stringConstantExpr"> <xqx:value>AddisonWesley</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> <xqx:expr xsi:type="xqx:operatorExpr" xqx:infix="true"> <xqx:opType>></xqx:opType> <xqx:parameters> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:nodeName> <xqx:QName>year</xqx:QName> </xqx:nodeName> </xqx:attributeTest> </xqx:stepExpr> </xqx:expr> <xqx:expr xsi:type="xqx:integerConstantExpr"> <xqx:value>1991</xqx:value> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:parameters> </xqx:expr> </xqx:whereClause> <xqx:returnClause> <xqx:expr xsi:type="xqx:elementConstructor"> <xqx:tagName>book</xqx:tagName> <xqx:attributeList> <xqx:expr xsi:type="xqx:attributeConstructor"> <xqx:attributeName>year</xqx:attributeName> <xqx:attributeValue> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:attributeTest> <xqx:nodeName> <xqx:QName>year</xqx:QName> </xqx:nodeName> </xqx:attributeTest> </xqx:stepExpr> </xqx:expr> </xqx:attributeValue> </xqx:expr> </xqx:attributeList> <xqx:elementContent> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:pathExpr"> <xqx:expr xsi:type="xqx:variable"> <xqx:name>b</xqx:name> </xqx:expr> <xqx:stepExpr> <xqx:xpathAxis>child</xqx:xpathAxis> <xqx:elementTest> <xqx:nodeName> <xqx:QName>title</xqx:QName> </xqx:nodeName> </xqx:elementTest> </xqx:stepExpr> </xqx:expr> </xqx:expr> </xqx:elementContent> </xqx:expr> </xqx:returnClause> </xqx:expr> </xqx:elementContent> </xqx:expr> </xqx:queryBody> </xqx:mainModule> </xqx:module> When To Use Declarative XML XML As User Interface Development teams often forget to write use cases for administrators, downstream developers XML declarative languages must be considered as ways to provide user interfaces Documentation, examples important Usability, Learnability issues important Declarative XML In App. Lifecycle Compile-Time – Best Performance, Least Flexibility • Start-Time – XML Parsed Once, Good Performance, Good Flexibility. App must be restarted or manually refreshed to get flexibility • • Run-Time – Can Impact Performance, But Great Flexibility. Web Services Approach Start-Time Configuration Runtime Configuration Anti-Pattern: The Overly Abstract Database By using XML, you can achieve runtime configurability Instead of “hard coding” SQL table names, do the same job in XML configuration files Problem: database is so abstract that it is very hard or impossible to do SQL queries against it directly Database is closed – just a persistence extension to the application The application (and thus, the app dev team) assumes all responsibility for any work with the data The Split Persistence Data Model Applications Have Persistence Data Models Persistence Data Model – any data stored on disk Everything might be in the DB Everything might be in XML docs Usually, there is some split – i.e., 95% in a SQL DB, but 5% in a properties file Managing the split between DB-based data and XML-based data is “accidental complexity” and can be a headache Traditional Architecture Table Application Table Table SQL Database Table Typical Architecture Table Application Table Table SQL Database XML XML XML Table Joins between data models Can use XQuery to join the two data models Could use XPath enhanced SQL to join the two models Often, the two models are joined at the object level Databases Have Meta Data Meta Data Table Application Table Table Table SQL Database XML XML XML XML Datastore Databases Have Metadata Column names, table names & constraints are all metadata Defined at DB design time, which is usually app design time App is often hard coded against the database meta data XML as DB Metadata Meta Data Table Table Table Table Application es rib c s De XML XML XML XML Datastore SQL Database Pros vs. Cons More flexible – XML can change at runtime More complex at the application level Database must be more abstract Can make the application harder to extend Don’t forget – a DB table can also serve the same purpose as an XML meta data doc Alternative: Tables As Meta Data Meta Data Table Table Table Table Describes Application XML XML Table Table XML XML Datastore SQL Database