* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Presentación de PowerPoint
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid [email protected] Sources The slides are taken from: • Database System Concepts©Silberschatz, Korth and Sudarshan. See www.db-book.com for conditions on re-use • XML tutorial at http://www.w3schools.com/xml/default.asp (the ones with no footnote) AGENDA • • • • • • • Introduction Structure of XML Data XML Document Schema Querying and Transformation Application Program Interfaces to XML Storage of XML Data XML Applications Introduction What is XML? • • • • • • XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to describe data XML tags are not predefined. You must define your own tags Defined by the WWW Consortium (W3C) Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML • Documents have tags giving extra information about sections of the document – E.g. <title> XML </title> <slide> Introduction …</slide> • XML uses a Document Type Definition (DTD) or an XML Schema to describe the data • XML with a DTD or XML Schema is designed to be selfdescriptive The Main Difference Between XML and HTML • XML was designed to carry data. • XML is not a replacement for HTML. XML and HTML were designed with different goals: – XML was designed to describe data and to focus on what data is. HTML was designed to display data and to focus on how data looks. – HTML is about displaying information, while XML is about describing information. • XML is a Complement to HTML • It is important to understand that XML is not a replacement for HTML. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data. • XML is a cross-platform, software and hardware independent tool for transmitting information. How can XML be used • It is important to understand that XML was designed to store, carry, and exchange data. XML was not designed to display data. • XML can Separate Data from HTML • With XML, your data is stored outside your HTML. • When HTML is used to display data, the data is stored inside your HTML. With XML, data can be stored in separate XML files. This way you can concentrate on using HTML for data layout and display, and be sure that changes in the underlying data will not require any changes to your HTML. • XML data can also be stored inside HTML pages as "Data Islands". You can still concentrate on using HTML only for formatting and displaying the data. Why using XML • XML is Used to Exchange Data – With XML, data can be exchanged between incompatible systems. • XML and B2B – With XML, financial information can be exchanged over the Internet. • XML Can be Used to Share Data – With XML, plain text files can be used to share data. • XML Can be Used to Store Data – With XML, plain text files can be used to store data. • XML Can Make your Data More Useful – With XML, your data is available to more users. • XML Can be Used to Create New Languages – XML is the mother of WAP and WML. • If Developers Have Sense – If they DO have sense, all future applications will exchange their data in XML. XML: applications • Data interchange is critical in today’s networked world – Examples: • Banking: funds transfer • Order processing (especially inter-company orders) • Scientific data – Chemistry: ChemML, … – Genetics: BSML (Bio-Sequence Markup Language), … – Paper flow of information between organizations is being replaced by electronic flow of information • Each application area has its own set of standards for representing information • XML has become the basis for all new generation data interchange formats Database System Concepts - 5th Edition, Aug 22, 2005. 10.9 ©Silberschatz, Korth and Sudarshan XML applications (Cont.) • Earlier generation formats were based on plain text with line headers indicating the meaning of fields – Similar in concept to email headers – Does not allow for nested structures, no standard “type” language – Tied too closely to low level document structure (lines, spaces, etc) • Each XML based standard defines what are valid elements, using – XML type specification languages to specify the syntax • DTD (Document Type Descriptors) • XML Schema – Plus textual descriptions of the semantics • XML allows new tags to be defined as required – However, this may be constrained by DTDs • A wide variety of tools is available for parsing, browsing and querying XML documents/data Database System Concepts - 5th Edition, Aug 22, 2005. 10.10 ©Silberschatz, Korth and Sudarshan XML Does not DO Anything • XML was not designed to DO anything. • Maybe it is a little hard to understand, but XML does not DO anything. XML was created to structure, store and to send information. • The following example is a note to Tove from Jani, stored as XML: <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> • The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. • Someone must write a piece of software to send, receive or display it. Comparison with Relational Data • Inefficient: tags, which in effect represent schema information, are repeated • Better than relational tuples as a data-exchange format – Unlike relational tuples, XML data is self-documenting due to presence of tags – Non-rigid format: tags can be added – Allows nested structures – Wide acceptance, not only in database systems, but also in browsers, tools, and applications Database System Concepts - 5th Edition, Aug 22, 2005. 10.12 ©Silberschatz, Korth and Sudarshan Syntax XML syntax rules • • Rules are very simple and very strict. easy to learn, and very easy to use. XML documents use a self-describing and simple syntax. <?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> • • • • The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO8859-1 (Latin-1/West European) character set. The next line describes the root element of the document (like it was saying: "this document is a note"): <note> The next 4 lines describe 4 child elements of the root (to, from, heading, and body): And finally the last line defines the end of the root element XML syntax rules • All XML Elements Must Have a Closing Tag – With XML, it is illegal to omit the closing tag. • In HTML some elements do not have to have a closing tag. <p>This is a paragraph <p>This is another paragraph • XML declaration does not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag. <p>This is a paragraph</p> • XML Tags are Case Sensitive Unlike HTML, – With XML, the tag <Letter> is different from the tag <letter>. • XML Elements Must be Properly Nested <b><i>This text is bold and italic</i></b> • Comments in XML similar to that of HTML. <!-- This is a comment --> XML syntax rules • All XML documents must contain a single tag pair to define a root element. – All other elements must be within this root element. – All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element: <root> <child> <subchild>.....</subchild> </child> </root> • • • XML Attribute Values Must be Quoted With XML, it is illegal to omit quotation marks around attribute values. XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct: <?xml version="1.0" encoding="ISO-8859-1"?> <note date="12/11/2002"> <to>Tove</to> <from>Jani</from> </note> • With XML, the white space in your document is not truncated. This is unlike HTML. XML is Free and Extensible • XML tags are not predefined. You must "invent" your own tags. • The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like <p>, <h1>, etc.). • XML allows the author to define his own tags and his own document structure. • The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are "invented" by the author of the XML document. XML tags • The ability to specify new tags, and to create nested tag structures make XML a great way to exchange data, not just documents. – Much of the use of XML has been in data exchange applications, not as a replacement for HTML • Tags make data (relatively) self-documenting – E.g. <bank> <account> <account_number> A-101 </account_number> <branch_name> Downtown </branch_name> <balance> 500 </balance> </account> <depositor> <account_number> A-101 </account_number> <customer_name> Johnson </customer_name> </depositor> </bank> Database System Concepts - 5th Edition, Aug 22, 2005. 10.18 ©Silberschatz, Korth and Sudarshan Structure of XML Data Structure of XML Data • Tag: label for a section of data • Element: section of data beginning with <tagname> and ending with matching </tagname> • Elements must be properly nested – Proper nesting • <account> … <balance> …. </balance> </account> – Improper nesting • <account> … <balance> …. </account> </balance> – Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element. • Every document must have a single top-level element Database System Concepts - 5th Edition, Aug 22, 2005. 10.20 ©Silberschatz, Korth and Sudarshan Elements are extensible • XML documents can be extended to carry more information. • Look at the following XML NOTE example: <note> <to>Tove</to> <from>Jani</from> <body>Don't forget me this weekend!</body> </note> • Let's imagine that we created an application to produce this output: MESSAGE To: Tove From: Jani Don't forget me this weekend! Elements are extensible • Imagine that the author of the XML document added some extra information to it: <note> <date>2002-08-01</date> <to>Tove</to> <from>Jani</from> <body>Don't forget me this weekend!</body> </note> • Should the application break or crash? • No. • The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output. Elements have Content • An XML element is everything from (including) the element's start tag to (including) the element's end tag. • An element can have: – – – – – element content, mixed content, simple content, or empty content. An element can also have attributes. Elements have Content book has element content, because it contains other elements. <book> Prod has empty content, <title>My First XML</title> because it carries no information <prod id="33-657" media="paper"></prod> <chapter>Introduction to XML Chapter has mixed content because it <para>What is HTML</para> contains both text and other elements <para>What is XML</para> Para has simple content (or text </chapter> content) <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book> In the example above only the prod element has attributes. The attribute named id has the value "33-657". The attribute named media has the value "paper". Element Naming • XML elements must follow these naming rules: – – – – Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML, or Xml, etc) Names cannot contain spaces • follow these simple rules: – Any name can be used, no words are reserved, but the idea is to make names descriptive. – Avoid "-" and "." in names. (subtract. or property ) – Element names can be as long as you like, – XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. Use the naming rules of your database for the elements in the XML documents. – Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if your software vendor doesn't support them. – The ":" should not be used in element names because it is reserved to be used for something called namespaces XML elements attributes • • • XML elements can have attributes in the start tag, just like HTML. Attributes are used to provide additional information about elements. From HTML <IMG SRC="computer.gif">. The SRC attribute provides additional information about the IMG element. • • • • In HTML (and in XML) attributes provide additional information about elements: Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element: <file type="gif">computer.gif</file> Attribute values must always be enclosed in quotes, but either single or double quotes can be used. <person sex="female"> <person sex='female'> Attributes • Elements can have attributes <account acct-type = “checking” > <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance> </account> • Attributes are specified by name=value pairs inside the starting tag of an element • An element may have several attributes, but each attribute name can only occur once <account acct-type = “checking” monthly-fee=“5”> Database System Concepts - 5th Edition, Aug 22, 2005. 10.27 ©Silberschatz, Korth and Sudarshan Use of Elements vs. Attributes • Data can be stored in child elements or in attributes. • Take a look at these examples: <person sex="female"> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> <person> <sex>female</sex> <firstname>Anna</firstname> <lastname>Smith</lastname> </person> • In the first example sex is an attribute. In the last, sex is a child element. Both examples provide the same information. • There are no rules about when to use attributes, and when to use child elements. • In XML use child elements if the information feels like data. Attributes vs. Subelements • Distinction between subelement and attribute – In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents – In the context of data representation, the difference is unclear and may be confusing • Same information can be represented in two ways – <account account_number = “A-101”> …. </account> – <account> <account_number>A-101</account_number> … </account> – Suggestion: use attributes for identifiers of elements, and use subelements for contents Database System Concepts - 5th Edition, Aug 22, 2005. 10.29 ©Silberschatz, Korth and Sudarshan Avoid using attributes? • Some of the problems with using attributes are: – attributes cannot contain multiple values (child elements can) – attributes are not easily expandable (for future changes) – attributes cannot describe structures (child elements can) – attributes are more difficult to manipulate by program code – attribute values are not easy to test against a Document Type Definition (DTD) - which is used to define the legal elements of an XML document Avoid using attributes? • If you use attributes as containers for data, you end up with documents that are difficult to read and maintain. • Try to use elements to describe data. • Use attributes only to provide information that is not relevant to the data. • Don't end up like this (this is not how XML should be used): <note day="12" month="11" year="2002" to="Tove" from="Jani" heading="Reminder" body="Don't forget me this weekend!"> </note> • metadata (data about data) should be stored as attributes, and that data itself should be stored as elements. Elements can be nested <bank-1> <customer> <customer_name> Hayes </customer_name> <customer_street> Main </customer_street> <customer_city> Harrison </customer_city> <account> <account_number> A-102 </account_number> <branch_name> Perryridge </branch_name> <balance> 400 </balance> </account> <account> … </account> </customer> . . </bank-1> Database System Concepts - 5th Edition, Aug 22, 2005. 10.32 ©Silberschatz, Korth and Sudarshan Motivation for Nesting • Nesting of data is useful in data transfer – Example: elements representing customer_id, customer_name, and address nested within an order element • Nesting is not supported, or discouraged, in relational databases – With multiple orders, customer name and address are stored redundantly – normalization replaces nested structures in each order by foreign key into table storing customer name and address information – Nesting is supported in object-relational databases • But nesting is appropriate when transferring data – External application does not have direct access to data referenced by a foreign key Database System Concepts - 5th Edition, Aug 22, 2005. 10.33 ©Silberschatz, Korth and Sudarshan Elements and text • Mixture of text with sub-elements is legal in XML. – Example: <account> This account is seldom used any more. <account_number> A-102</account_number> <branch_name> Perryridge</branch_name> <balance>400 </balance> </account> – Useful for document markup, but discouraged for data representation Database System Concepts - 5th Edition, Aug 22, 2005. 10.34 ©Silberschatz, Korth and Sudarshan Namespaces • XML data has to be exchanged between organizations • Same tag name may have different meaning in different organizations, causing confusion on exchanged documents • Specifying a unique string as an element name avoids confusion • Better solution: use unique-name:element-name • Avoid using long unique names all over document by using XML Namespaces <bank Xmlns:FB=‘http://www.FirstBank.com’> … <FB:branch> <FB:branchname>Downtown</FB:branchname> <FB:branchcity> Brooklyn </FB:branchcity> </FB:branch> … </bank> Database System Concepts - 5th Edition, Aug 22, 2005. 10.35 ©Silberschatz, Korth and Sudarshan More on XML Syntax • Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag – <account number=“A-101” branch=“Perryridge” balance=“200 /> • To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below – <![CDATA[<account> … </account>]]> Here, <account> and </account> are treated as just strings CDATA stands for “character data” Database System Concepts - 5th Edition, Aug 22, 2005. 10.36 ©Silberschatz, Korth and Sudarshan DTD Well Formed XML Documents • • A "Well Formed" XML document has correct XML syntax. A "Well Formed" XML document is a document that conforms to the XML syntax rules that were described in the previous chapters: – – – – – XML documents must have a root element XML elements must have a closing tag XML tags are case sensitive XML elements must be properly nested XML attribute values must always be quoted <?xml version="1.0" encoding="ISO-8859-1"?> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Valid XML Documents • A "Valid" XML document also conforms to a DTD. • A "Valid" XML document is a "Well Formed" XML document, which also conforms to the rules of a Document Type Definition (DTD): <?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE note SYSTEM "InternalNote.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> DTD vs SCHEMA • A DTD defines the legal elements of an XML document. • The purpose of a DTD is to define the legal building blocks of an XML document. It defines the document structure with a list of legal elements. • Why Use a DTD? – With a DTD, each of your XML files can carry a description of its own format. – With a DTD, independent groups of people can agree to use a standard DTD for interchanging data. – Your application can use a standard DTD to verify that the data you receive from the outside world is valid. – You can also use a DTD to verify your own data. • W3C supports an XML based alternative to DTD called XML Schema Document Type Definition (DTD) • The type of an XML document can be specified using a DTD • DTD constraints structure of XML data – What elements can occur – What attributes can/must an element have – What subelements can/must occur inside each element, and how many times. • DTD does not constrain data types – All values represented as strings in XML • DTD syntax – <!ELEMENT element (subelements-specification) > – <!ATTLIST element (attributes) > Database System Concepts - 5th Edition, Aug 22, 2005. 10.41 ©Silberschatz, Korth and Sudarshan DTD introduction • • • A DTD can be declared inline inside an XML document, or as an external reference. If the DTD is declared inside the XML file It should be wrapped in a DOCTYPE definition with the following syntax: <!DOCTYPE root-element [element-declarations]> Example XML document with an internal DTD: <?xml version="1.0"?> <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> ]> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend</body> </note> DTD introduction (cont) • External DTD Declaration – If the DTD is declared in an external file, it should be wrapped in a DOCTYPE definition with the following syntax: • <!DOCTYPE root-element SYSTEM "filename"> • This is the same XML document as above, but with an external DTD <?xml version="1.0"?> <!DOCTYPE note SYSTEM "note.dtd"> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> • And this is the file "note.dtd" which contains the DTD: <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> DTD - XML Building Blocks • Seen from a DTD point of view, all XML documents (and HTML documents) are made up by the following building blocks: – – – – – Elements Attributes Entities PCDATA CDATA Elements and attributes • Elements are the main building blocks of both XML and HTML documents. • Examples of HTML elements are "body" and "table". Examples of XML elements could be "note" and "message". Elements can contain text, other elements, or be empty. Examples of empty HTML elements are "hr", "br" and "img". Examples: • <body>some text</body> <message>some text</message> • Attributes provide extra information about elements. Attributes are always placed inside the opening tag of an element. Attributes always come in name/value pairs. The following "img" element has additional information about a source file: <img src="computer.gif" /> The name of the element is "img". The name of the attribute is "src". The value of the attribute is "computer.gif". Since the element itself is empty it is closed by a " /". Entities • Some characters have a special meaning in XML, like the less than sign (<) that defines the start of an XML tag. • Most of you know the HTML entity: " ". This "no-breakingspace" entity is used in HTML to insert an extra space in a document. Entities are expanded when a document is parsed by an XML parser. • The following entities are predefined in XML: Entity References < > & " ' Character < > & " ' Entities • Entities are variables used to define shortcuts to standard text or special characters. • Entity references are references to entities • Entities can be declared internal or external An Internal Entity Declaration • Syntax • <!ENTITY entity-name "entity-value"> • Example DTD Example: <!ENTITY writer "Donald Duck."> <!ENTITY copyright "Copyright W3Schools."> XML example: <author>&writer;©right;</author> • Note: An entity has three parts: an ampersand (&), an entity name, and a semicolon (;). An External Entity Declaration • Syntax • <!ENTITY entity-name SYSTEM "URI/URL"> DTD Example: <!ENTITY writer SYSTEM "http://www.w3schools.com/entities.dtd"> <!ENTITY copyright SYSTEM "http://www.w3schools.com/entities.dtd"> • XML example: <author>&writer;©right;</author> PCDATA and CDATA • PCDATA means parsed character data. – Think of character data as the text found between the start tag and the end tag of an XML element. – PCDATA is text that WILL be parsed by a parser. The text will be examined by the parser for entities and markup. – Tags inside the text will be treated as markup and entities will be expanded. – However, parsed character data should not contain any &, <, or > characters; these need to be represented by the & < and > entities, respectively. • CDATA means character data. – CDATA is text that will NOT be parsed by a parser. Tags inside the text will NOT be treated as markup and entities will not be expanded. Empty Elements and PCDATA • Empty elements are declared with the category keyword EMPTY: <!ELEMENT element-name EMPTY> • Example: <!ELEMENT br EMPTY> • XML example: <br /> • Elements with Parsed Character Data • Elements with only parsed character data are declared with #PCDATA inside parentheses: <!ELEMENT element-name (#PCDATA)> • Example: • <!ELEMENT from (#PCDATA)> More on elements • Elements declared with the category keyword ANY, can contain any combination of parsable data: <!ELEMENT element-name ANY> Example: <!ELEMENT note ANY> • Elements with one or more children (sequences) are declared with the name of the children elements inside parentheses: <!ELEMENT element-name (child1)> or <!ELEMENT element-name (child1,child2,...)> • Example: <!ELEMENT note (to,from,heading,body)> • When children are declared in a sequence separated by commas, the children must appear in the same sequence in the document. In a full declaration, the children must also be declared, and the children can also have children. The full declaration of the "note" element is: <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> Elements (cont) • Declaring Only One Occurrence of an Element <!ELEMENT element-name (child-name)> <!ELEMENT note (message)> • The example above declares that the child element "message" must occur once, and only once inside the "note" element. • Declaring Minimum One Occurrence of an Element <!ELEMENT element-name (child-name+)> <!ELEMENT note (message+)> • The + sign in the example above declares that the child element "message" must occur one or more times inside the "note" element. • Declaring Zero or More Occurrences of an Element <!ELEMENT element-name (child-name*)> <!ELEMENT note (message*)> • The * sign in the example above declares that the child element "message" can occur zero or more times inside the "note" element. More on elements • Declaring Zero or One Occurrences of an Element <!ELEMENT element-name (child-name?)> <!ELEMENT note (message?)> • The ? sign in the example above declares that the child element "message" can occur zero or one time inside the "note" element. • Declaring either/or Content <!ELEMENT note (to,from,header,(message|body))> The example above declares that the "note" element must contain a "to" element, a "from" element, a "header" element, and either a "message" or a "body" element. • Declaring Mixed Content – <!ELEMENT note (#PCDATA|to|from|header|message)*> • The example above declares that the "note" element can contain zero or more occurrences of parsed character data, "to", "from", "header", or "message" elements. Bank DTD <!DOCTYPE bank [ <!ELEMENT bank ( ( account | customer | depositor)+)> <!ELEMENT account (account_number branch_name balance)> <! ELEMENT customer(customer_name customer_street customer_city)> <! ELEMENT depositor (customer_name account_number)> <! ELEMENT account_number (#PCDATA)> <! ELEMENT branch_name (#PCDATA)> <! ELEMENT balance(#PCDATA)> <! ELEMENT customer_name(#PCDATA)> <! ELEMENT customer_street(#PCDATA)> <! ELEMENT customer_city(#PCDATA)> ]> Database System Concepts - 5th Edition, Aug 22, 2005. 10.56 ©Silberschatz, Korth and Sudarshan Declaring Attributes • An attribute declaration has the following syntax: <!ATTLIST element-name attribute-name attribute-type default-value> • DTD example: <!ATTLIST payment type CDATA "check"> • XML example: <payment type="check" /> Attribute Specification in DTD • Attribute specification : for each attribute – Name – Type of attribute • CDATA • ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs) – more on this later – Whether • mandatory (#REQUIRED) • has a default value (value), • or neither (#IMPLIED) • Examples – <!ATTLIST account acct-type CDATA “checking”> – <!ATTLIST customer customer_id ID # REQUIRED accounts IDREFS # REQUIRED > Database System Concepts - 5th Edition, Aug 22, 2005. 10.58 ©Silberschatz, Korth and Sudarshan The attribute-type can be one of the following Type Description • CDATA The value is character data • (en1|en2|..) The value must be one from an enumerated list • ID The value is a unique id • IDREF The value is the id of another element • IDREFS The value is a list of other ids • NMTOKEN The value is a valid XML name • NMTOKENS The value is a list of valid XML names • ENTITY The value is an entity • ENTITIES The value is a list of entities • NOTATION The value is a name of a notation • xml: The value is a predefined xml value Attributes • The default-value can be one of the following: Value Explanation Value The default value of the attribute #REQUIRED The attribute is required #IMPLIED The attribute is not required #FIXED value The attribute value is fixed • DTD: <!ELEMENT square EMPTY> <!ATTLIST square width CDATA "0"> • Valid XML: <square width="100" /> In the example above, the "square" element is defined to be an empty element with a "width" attribute of type CDATA. If no width is specified, it has a default value of 0. • attributes • #REQUIRED • Syntax • <!ATTLIST element-name attribute_name attribute-type #REQUIRED> • DTD: <!ATTLIST person number CDATA #REQUIRED> • • • • Valid XML: <person number="5677" /> Invalid XML: <person /> • Use the #REQUIRED keyword if you don't have an option for a default value, but still want to force the attribute to be present. attributes • • • • #IMPLIED <!ATTLIST element-name attribute-name attribute-type #IMPLIED> DTD: <!ATTLIST contact fax CDATA #IMPLIED> Valid XML: <contact fax="555-667788" /> • Valid XML: <contact /> • Use the #IMPLIED keyword if you don't want to force the author to include an attribute, and you don't have an option for a default value. • • • #FIXED <!ATTLIST element-name attribute-name attribute-type #FIXED "value"> DTD: <!ATTLIST sender company CDATA #FIXED "Microsoft"> • Valid XML: <sender company="Microsoft" /> • • • Invalid XML: <sender company="W3Schools" /> Use the #FIXED keyword when you want an attribute to have a fixed value without allowing the author to change it. If an author includes another value, the XML parser will return an error. attributes • Enumerated Attribute Values • <!ATTLIST element-name attribute-name (en1|en2|..) defaultvalue> • Example • DTD: • <!ATTLIST payment type (check|cash) "cash"> • • • • XML example: <payment type="check" /> or <payment type="cash" /> • Use enumerated attribute values when you want the attribute value to be one of a fixed set of legal values. IDs and IDREFs • An element can have at most one attribute of type ID • The ID attribute value of each element in an XML document must be distinct – Thus the ID attribute value is an object identifier • An attribute of type IDREF must contain the ID value of an element in the same document • An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document Database System Concepts - 5th Edition, Aug 22, 2005. 10.64 ©Silberschatz, Korth and Sudarshan Bank DTD with Attributes • Bank DTD with ID and IDREF attribute types. <!DOCTYPE bank-2[ ]> <!ELEMENT account (branch, balance)> <!ATTLIST account account_number ID # REQUIRED owners IDREFS # REQUIRED> <!ELEMENT customer(customer_name, customer_street, customer_city)> <!ATTLIST customer customer_id ID # REQUIRED accounts IDREFS # REQUIRED> … declarations for branch, balance, customer_name, customer_street and customer_city Database System Concepts - 5th Edition, Aug 22, 2005. 10.65 ©Silberschatz, Korth and Sudarshan XML data with ID and IDREF attributes <bank-2> <account account_number=“A-401” owners=“C100 C102”> <branch_name> Downtown </branch_name> <balance> 500 </balance> </account> <customer customer_id=“C100” accounts=“A-401”> <customer_name>Joe </customer_name> <customer_street> Monroe </customer_street> <customer_city> Madison</customer_city> </customer> <customer customer_id=“C102” accounts=“A-401 A-402”> <customer_name> Mary </customer_name> <customer_street> Erin </customer_street> <customer_city> Newark </customer_city> </customer> </bank-2> Database System Concepts - 5th Edition, Aug 22, 2005. 10.66 ©Silberschatz, Korth and Sudarshan Example: TV Schedule DTD • By David Moisan. Copied from http://www.davidmoisan.org/ <!DOCTYPE TVSCHEDULE [ <!ELEMENT TVSCHEDULE (CHANNEL+)> <!ELEMENT CHANNEL (BANNER,DAY+)> <!ELEMENT BANNER (#PCDATA)> <!ELEMENT DAY (DATE,(HOLIDAY|PROGRAMSLOT+)+)> <!ELEMENT HOLIDAY (#PCDATA)> <!ELEMENT DATE (#PCDATA)> <!ELEMENT PROGRAMSLOT (TIME,TITLE,DESCRIPTION?)> <!ELEMENT TIME (#PCDATA)> <!ELEMENT TITLE (#PCDATA)> <!ELEMENT DESCRIPTION (#PCDATA)> <!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED> <!ATTLIST CHANNEL CHAN CDATA #REQUIRED> <!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED> <!ATTLIST TITLE RATING CDATA #IMPLIED> <!ATTLIST TITLE LANGUAGE CDATA #IMPLIED> ]> Newspaper Article DTD • Copied from http://www.vervet.com/ <!DOCTYPE NEWSPAPER [ <!ELEMENT NEWSPAPER (ARTICLE+)> <!ELEMENT ARTICLE (HEADLINE,BYLINE,LEAD,BODY,NOTES)> <!ELEMENT HEADLINE (#PCDATA)> <!ELEMENT BYLINE (#PCDATA)> <!ELEMENT LEAD (#PCDATA)> <!ELEMENT BODY (#PCDATA)> <!ELEMENT NOTES (#PCDATA)> <!ATTLIST ARTICLE AUTHOR CDATA #REQUIRED> <!ATTLIST ARTICLE EDITOR CDATA #IMPLIED> <!ATTLIST ARTICLE DATE CDATA #IMPLIED> <!ATTLIST ARTICLE EDITION CDATA #IMPLIED> <!ENTITY NEWSPAPER "Vervet Logic Times"> <!ENTITY PUBLISHER "Vervet Logic Press"> <!ENTITY COPYRIGHT "Copyright 1998 Vervet Logic Press"> ]> Limitations of DTDs • No typing of text elements and attributes – All values are strings, no integers, reals, etc. • Difficult to specify unordered sets of subelements – Order is usually irrelevant in databases (unlike in the document-layout environment from which XML evolved) – (A | B)* allows specification of an unordered set, but • Cannot ensure that each of A and B occurs only once • IDs and IDREFs are untyped – The owners attribute of an account may contain a reference to another account, which is meaningless • owners attribute should ideally be constrained to refer to customer elements Database System Concepts - 5th Edition, Aug 22, 2005. 10.69 ©Silberschatz, Korth and Sudarshan XML Schema • XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports – Typing of values • E.g. integer, string, etc • Also, constraints on min/max values – User-defined, complex types – Many more features, including • uniqueness and foreign key constraints, inheritance • XML Schema is itself specified in XML syntax, unlike DTDs – More-standard representation, but verbose • XML Scheme is integrated with namespaces • BUT: XML Schema is significantly more complicated than DTDs. Database System Concepts - 5th Edition, Aug 22, 2005. 10.70 ©Silberschatz, Korth and Sudarshan Schemas • XML Schema is an XML-based alternative to DTD. • An XML schema describes the structure of an XML document. • The XML Schema language is also referred to as XML Schema Definition (XSD). • The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD. • An XML Schema: – – – – – – – – defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text defines data types for elements and attributes defines default and fixed values for elements and attributes An XML Schema The following XML Schema file called "note.xsd" that defines the elements of the XML document above ("note.xml"): <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified"> <xs:element name="note"> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema> A Reference to an XML Schema • This XML document has a reference to an XML Schema: <?xml version="1.0"?> <note xmlns="http://www.w3schools.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3schools.com note.xsd"> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body> </note> Simple elements • XML Schemas define the elements of your XML files. • A simple element is an XML element that contains only text. It cannot contain any other elements or attributes. • A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes. • However, the "only text" restriction is quite misleading. The text can be of many different types. It can be one of the types included in the XML Schema definition (boolean, string, date, etc.), or it can be a custom type that you can define yourself. • You can also add restrictions (facets) to a data type in order to limit its content, or you can require the data to match a specific pattern. Defining a Simple Element • The syntax for defining a simple element is: <xs:element name="xxx" type="yyy"/> • • where xxx is the name of the element and yyy is the data type of the element. XML Schema has a lot of built-in data types. The most common types are: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time Example • Here are some XML elements: <lastname>Refsnes</lastname> <age>36</age> <dateborn>1970-03-27</dateborn> • And here are the corresponding simple element definitions: <xs:element name="lastname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/> Default and Fixed Values for Simple Elements • Simple elements may have a default value OR a fixed value specified. • A default value is automatically assigned to the element when no other value is specified. • In the following example the default value is "red": <xs:element name="color" type="xs:string" default="red"/> • A fixed value is also automatically assigned to the element, and you cannot specify another value. • In the following example the fixed value is "red": <xs:element name="color" type="xs:string" fixed="red"/> Attributes • Simple elements cannot have attributes. If an element has attributes, it is considered to be of a complex type. But the attribute itself is always declared as a simple type. • The syntax for defining an attribute is: <xs:attribute name="xxx" type="yyy"/> where xxx is the name of the attribute and yyy specifies the data type of the attribute. • The most common types are: xs:string xs:decimal xs:integer xs:boolean xs:date xs:time Attributes (cont.) • Here is an XML element with an attribute: <lastname lang="EN">Smith</lastname> • And here is the corresponding attribute definition: <xs:attribute name="lang" type="xs:string"/> • Default and fixed like with elements Restrictions (facets) • • • Restrictions are used to define acceptable values for XML elements or attributes. Restrictions on XML elements are called facets. Restrictions on Values The following example defines an element called "age" with a restriction. The value of age cannot be lower than 0 or greater than 120: <xs:element name="age"> <xs:simpleType> <xs:restriction base="xs:integer"> <xs:minInclusive value="0"/> <xs:maxInclusive value="120"/> </xs:restriction> </xs:simpleType> </xs:element> Restrictions on a Set of Values • To limit the content of an XML element to a set of acceptable values, we would use the enumeration constraint. • The example below defines an element called "car" with a restriction. The only acceptable values are: Audi, Golf, BMW: <xs:element name="car"> <xs:simpleType> <xs:restriction base="xs:string"> <xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/> </xs:restriction> </xs:simpleType> </xs:element> • Many Restrictions can be settled in data types (lenght, values, set of values) What is a Complex Element? • A complex element is an XML element that contains other elements and/or attributes. • There are four kinds of complex elements: – – – – empty elements elements that contain only other elements elements that contain only text elements that contain both other elements and text • Note: Each of these elements may contain attributes as well! How to Define a Complex Element • Look at this complex XML element, "employee", which contains only other elements: <employee> <firstname>John</firstname> <lastname>Smith</lastname> </employee> • We can define a complex element in an XML Schema two different ways: 1. The "employee" element can be declared directly by naming the element, like this: <xs:element name="employee"> <xs:complexType> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> Complex element definition • The "employee" element can have a type attribute that refers to the name of the complex type to use: <xs:element name="employee" type="personinfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> Complex elements • If you use the method described above, several elements can refer to the same complex type, like this: <xs:element name="employee" type="personinfo"/> <xs:element name="student" type="personinfo"/> <xs:element name="member" type="personinfo"/> <xs:complexType name="personinfo"> <xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/> </xs:sequence> </xs:complexType> More features of XML Schema • Attributes specified by xs:attribute tag: – <xs:attribute name = “account_number”/> – adding the attribute use = “required” means value must be specified • Key constraint: “account numbers form a key for account elements under the root bank element: <xs:key name = “accountKey”> <xs:selector xpath = “]bank/account”/> <xs:field xpath = “account_number”/> <\xs:key> • Foreign key constraint from depositor to account: <xs:keyref name = “depositorAccountKey” refer=“accountKey”> <xs:selector xpath = “]bank/account”/> <xs:field xpath = “account_number”/> <\xs:keyref> Database System Concepts - 5th Edition, Aug 22, 2005. 10.91 ©Silberschatz, Korth and Sudarshan An XSD example <?xml version="1.0" encoding="ISO-8859-1"?> <shiporder orderid="889923" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="shiporder.xsd"> <orderperson>John Smith</orderperson> <shipto> <name>Ola Nordmann</name> <address>Langgt 23</address> <city>4000 Stavanger</city> <country>Norway</country> </shipto> <item> <title>Empire Burlesque</title> <note>Special Edition</note> <quantity>1</quantity> <price>10.90</price> </item> <item> <title>Hide your heart</title> <quantity>1</quantity> <price>9.90</price> </item> </shiporder> Create an XML Schema of the example • Now we want to create a schema for the XML document in the slide • We start by opening a new file that we will call "shiporder.xsd". To create the schema we could simply follow the structure in the XML document and define each element as we find it. We will start with the standard XML declaration followed by the xs:schema element that defines a schema: <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> ... ... • </xs:schema> In the schema above we use the standard namespace (xs), and the URI associated with this namespace is the Schema language definition, which has the standard value of http://www.w3.org/2001/XMLSchema. Example cont. • Next, we have to define the "shiporder" element. This element has an attribute and it contains other elements, therefore we consider it as a complex type. The child elements of the "shiporder" element is surrounded by a xs:sequence element that defines an ordered sequence of sub elements: <xs:element name="shiporder"> <xs:complexType> <xs:sequence> ... ... </xs:sequence> ... </xs:complexType> </xs:element> Example (cont.) • Then we have to define the "orderperson" element as a simple type (because it does not contain any attributes or other elements). The type (xs:string) is prefixed with the namespace prefix associated with XML Schema that indicates a predefined schema data type: <xs:element name="orderperson" type="xs:string"/> • Next, we have to define two elements that are of the complex type: "shipto" and "item". We start by defining the "shipto" element: <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> example • Now we can define the "item" element. This element can appear multiple times inside a "shiporder" element. This is specified by setting the maxOccurs attribute of the "item" element to "unbounded" which means that there can be as many occurrences of the "item" element as the author wishes. Notice that the "note" element is optional. We have specified this by setting the minOccurs attribute to zero: <xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> example • We can now declare the attribute of the "shiporder" element. Since this is a required attribute we specify use="required". • Note: The attribute declarations must always come last: • <xs:attribute name="orderid" type="xs:string" use="required"/> example the complete listing of the schema file called "shiporder.xsd": <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="shiporder"> <xs:complexType> <xs:sequence> <xs:element name="orderperson" type="xs:string"/> <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> Example (cont) <xs:element name="item" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="orderid" type="xs:string" use="required"/> </xs:complexType> </xs:element> </xs:schema> Divide the Schema • The previous design method is very simple, but can be difficult to read and maintain when documents are complex. The next design method is based on defining all elements and attributes first, and then referring to them using the ref attribute. <?xml version="1.0" encoding="ISO-8859-1" ?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- definition of simple elements --> <xs:element name="orderperson" type="xs:string"/> <xs:element name="name" type="xs:string"/> <xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/> <xs:element name="title" type="xs:string"/> <xs:element name="note" type="xs:string"/> <xs:element name="quantity" type="xs:positiveInteger"/> <xs:element name="price" type="xs:decimal"/> <!-- definition of attributes --> <xs:attribute name="orderid" type="xs:string"/> Divide the Schema <!-- definition of complex elements --> <xs:element name="shipto"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="address"/> <xs:element ref="city"/> <xs:element ref="country"/> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="item"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="note" minOccurs="0"/> <xs:element ref="quantity"/> <xs:element ref="price"/> </xs:sequence> </xs:complexType> </xs:element> … … Querying and transforming Querying and Transforming XML Data • • • • Translation of information from one XML schema to another Querying on XML data Above two are closely related, and handled by the same tools Standard XML querying/translation languages – XPath • Simple language consisting of path expressions – XSLT • Simple language designed for translation from XML to XML and XML to HTML – XQuery • An XML query language with a rich set of features Database System Concepts - 5th Edition, Aug 22, 2005. 10.103 ©Silberschatz, Korth and Sudarshan Tree Model of XML Data • Query and transformation languages are based on a tree model of XML data • An XML document is modeled as a tree, with nodes corresponding to elements and attributes – Element nodes have child nodes, which can be attributes or subelements – Text in an element is modeled as a text node child of the element – Children of a node are ordered according to their order in the XML document – Element and attribute nodes (except for the root node) have a single parent, which is an element node – The root node has a single child, which is the root element of the document Database System Concepts - 5th Edition, Aug 22, 2005. 10.104 ©Silberschatz, Korth and Sudarshan XPath • XPath is used to address (select) parts of documents using path expressions • A path expression is a sequence of steps separated by “/” – Think of file names in a directory hierarchy • Result of path expression: set of values that along with their containing elements/attributes match the specified path • E.g. /bank-2/customer/customer_name evaluated on the bank-2 data we saw earlier returns <customer_name>Joe</customer_name> <customer_name>Mary</customer_name> • E.g. /bank-2/customer/customer_name/text( ) returns the same names, but without the enclosing tags Database System Concepts - 5th Edition, Aug 22, 2005. 10.105 ©Silberschatz, Korth and Sudarshan What is XPath? • • • • • • XPath is a syntax for defining parts of an XML document XPath uses path expressions to navigate in XML documents XPath contains a library of standard functions XPath is a major element in XSLT XPath is a W3C Standard XPath Path Expressions – XPath uses path expressions to select nodes or node-sets in an XML document. These path expressions look very much like the expressions you see when you work with a traditional computer file system. • XPath Standard Functions – XPath includes over 100 built-in functions. There are functions for string values, numeric values, date and time comparison, node and QName manipulation, sequence manipulation, Boolean values, and more. XPath Terminology • Nodes: In XPath, there are seven kinds of nodes: element, attribute, text, namespace, processing-instruction, comment, and document (root) nodes. XML documents are treated as trees of nodes. • The root of the tree is called the document node (or root node). • Look at the following XML document: <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="en">Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore> Example of nodes in the XML document above • <bookstore> (document node) • <author>J K. Rowling</author> (element node) • lang="en" (attribute node) • Atomic values • Atomic values are nodes with no children or parent. • Example of atomic values: • J K. Rowling • "en" Relationship of Nodes • Parent • Each element and attribute has one parent. • In the following example; the book element is the parent of the title, author, year, and price: <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> Relationship of Nodes • Children • Element nodes may have zero, one or more children. • In the following example; the title, author, year, and price elements are all children of the book element: <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> Relationship of Nodes • Siblings – Nodes that have the same parent. – In the following example; the title, author, year, and price elements are all siblings: <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> • Ancestors – – A node's parent, parent's parent, etc. In the following example; the ancestors of the title element are the book element and the bookstore element: <bookstore> <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore> Relationship of Nodes • Descendants • A node's children, children's children, etc. • In the following example; descendants of the bookstore element are the book, title, author, year, and price elements: <bookstore> <book> <title>Harry Potter</title> <author>J K. Rowling</author> <year>2005</year> <price>29.99</price> </book> </bookstore> Xpath syntax • The XML Example Document • We will use the following XML document in the examples below. <?xml version="1.0" encoding="ISO-8859-1"?> <bookstore> <book> <title lang="eng">Harry Potter</title> <price>29.99</price> </book> <book> <title lang="eng">Learning XML</title> <price>39.95</price> </book> </bookstore> Selecting Nodes • XPath uses path expressions to select nodes in an XML document. The node is selected by following a path or steps. The most useful path expressions are listed below: Expression Description nodename Selects all child nodes of the node / Selects from the root node // Selects nodes in the document from the current node that match the selection no matter where they are . Selects the current node .. Selects the parent of the current node @ Selects attributes Selecting nodes examples Path Expression • bookstore • /bookstore • bookstore/book Result Selects all the child nodes of the bookstore element Selects the root element bookstore Selects all book elements that are children of bookstore • //book Selects all book elements no matter where they are in the document • bookstore//book Selects all book elements that are descendant of the bookstore element, no matter where they are under the bookstore element • //@lang Selects all attributes that are named lang XPath (Cont.) • The initial “/” denotes root of the document (above the top-level tag) • Path expressions are evaluated left to right – Each step operates on the set of instances produced by the previous step • Selection predicates may follow any step in a path, in [ ] – E.g. /bank-2/account[balance > 400] • returns account elements with a balance value greater than 400 • /bank-2/account[balance] returns account elements containing a balance subelement • Attributes are accessed using “@” – E.g. /bank-2/account[balance > 400]/@account_number • returns the account numbers of accounts with balance > 400 – IDREF attributes are not dereferenced automatically (more on this later) Database System Concepts - 5th Edition, Aug 22, 2005. 10.116 ©Silberschatz, Korth and Sudarshan Functions in XPath • XPath provides several functions – The function count() at the end of a path counts the number of elements in the set generated by the path • E.g. /bank-2/account[count(./customer) > 2] – Returns accounts with > 2 customers – Also function for testing position (1, 2, ..) of node w.r.t. siblings • Boolean connectives and and or and function not() can be used in predicates • IDREFs can be referenced using function id() – id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks – E.g. /bank-2/account/id(@owner) • returns all customers referred to from the owners attribute of account elements. Database System Concepts - 5th Edition, Aug 22, 2005. 10.117 ©Silberschatz, Korth and Sudarshan More XPath Features • Operator “|” used to implement union – E.g. /bank-2/account/id(@owner) | /bank-2/loan/id(@borrower) • Gives customers with either accounts or loans • However, “|” cannot be nested inside other operators. • “//” can be used to skip multiple levels of nodes – E.g. /bank-2//customer_name • finds any customer_name element anywhere under the /bank-2 element, regardless of the element in which it is contained. • A step in the path can go to parents, siblings, ancestors and descendants of the nodes generated by the previous step, not just to the children – “//”, described above, is a short from for specifying “all descendants” – “..” specifies the parent. • doc(name) returns the root of a named document Database System Concepts - 5th Edition, Aug 22, 2005. 10.118 ©Silberschatz, Korth and Sudarshan What is XQuery? • • • • XQuery is the language for querying XML data XQuery for XML is like SQL for databases XQuery is built on XPath expressions XQuery is supported by all the major database engines (IBM, Oracle, Microsoft, etc.) • XQuery is a W3C Recommendation • XQuery can be used to: – Extract information to use in a Web Service – Generate summary reports – Transform XML data to XHTML – Search Web documents for relevant information XQuery • XQuery is a general purpose query language for XML data • Currently being standardized by the World Wide Web Consortium (W3C) – The textbook description is based on a January 2005 draft of the standard. The final version may differ, but major features likely to stay unchanged. • XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL • XQuery uses a for … let … where … order by …result … (FLOWR) • syntax for  SQL from where  SQL where order by  SQL order by result  SQL select let allows temporary variables, and has no equivalent in SQL Database System Concepts - 5th Edition, Aug 22, 2005. 10.120 ©Silberschatz, Korth and Sudarshan FLWOR Syntax in XQuery • For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath • Simple FLWOR expression in XQuery – find all accounts with balance > 400, with each result enclosed in an <account_number> .. </account_number> tag for $x in /bank-2/account let $acctno := $x/@account_number where $x/balance > 400 return <account_number> { $acctno } </account_number> – Items in the return clause are XML text unless enclosed in {}, in which case they are evaluated Database System Concepts - 5th Edition, Aug 22, 2005. 10.121 ©Silberschatz, Korth and Sudarshan Flowr expression • Let clause not really needed in this query, and selection can be done In XPath. Query can be written as: for $x in /bank-2/account[balance>400] return <account_number> { $x/@account_number } </account_number> Joins • Joins are specified in a manner very similar to SQL for $a in /bank/account, $c in /bank/customer, $d in /bank/depositor where $a/account_number = $d/account_number and $c/customer_name = $d/customer_name return <cust_acct> { $c $a } </cust_acct> • The same query can be expressed with the selections specified as XPath selections: for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account_number = $a/account_number and customer_name = $c/customer_name] return <cust_acct> { $c $a } </cust_acct> Database System Concepts - 5th Edition, Aug 22, 2005. 10.123 ©Silberschatz, Korth and Sudarshan Nested Queries • The following query converts data from the flat structure for bank information into the nested structure used in bank-1 <bank-1> { for $c in /bank/customer return <customer> { $c/* } { for $d in /bank/depositor[customer_name = $c/customer_name], $a in /bank/account[account_number=$d/account_number] return $a } </customer> } </bank-1> • • $c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag $c/text() gives text content of an element without any subelements / tags Database System Concepts - 5th Edition, Aug 22, 2005. 10.124 ©Silberschatz, Korth and Sudarshan Sorting in XQuery • • • The order by clause can be used at the end of any expression. E.g. to return customers sorted by name for $c in /bank/customer order by $c/customer_name return <customer> { $c/* } </customer> Use order by $c/customer_name to sort in descending order Can sort at multiple levels of nesting (sort by customer_name, and by account_number within each customer) <bank-1> { for $c in /bank/customer order by $c/customer_name return <customer> { $c/* } { for $d in /bank/depositor[customer_name=$c/customer_name], $a in /bank/account[account_number=$d/account_number] } order by $a/account_number return <account> $a/* </account> </customer> } </bank-1> Database System Concepts - 5th Edition, Aug 22, 2005. 10.125 ©Silberschatz, Korth and Sudarshan Functions and Other XQuery Features • User defined functions with the type system of XMLSchema function balances(xs:string $c) returns list(xs:decimal*) { for $d in /bank/depositor[customer_name = $c], $a in /bank/account[account_number = $d/account_number] return $a/balance } • Types are optional for function parameters and return values • The * (as in decimal*) indicates a sequence of values of that type • Universal and existential quantification in where clause predicates – some $e in path satisfies P – every $e in path satisfies P • XQuery also supports If-then-else clauses Database System Concepts - 5th Edition, Aug 22, 2005. 10.126 ©Silberschatz, Korth and Sudarshan XQuery Conditional Expressions • "If-Then-Else" expressions are allowed in XQuery. • Look at the following example: for $x in doc("books.xml")/bookstore/book return if ($x/@category="CHILDREN") then <child>{data($x/title)}</child> else <adult>{data($x/title)}</adult> XQuery Conditional Expressions • The result of the example above will be: <adult>Everyday Italian</adult> <child>Harry Potter</child> <adult>Learning XML</adult> <adult>XQuery Kick Start</adult> Present the Result In an HTML List • Look at the following XQuery FLWOR expression: for $x in doc("books.xml")/bookstore/book/title order by $x return $x • The expression above will select all the title elements under the book elements that are under the bookstore element, and return the title elements in alphabetical order. Present the Result In an HTML List • Now we want to list all the book-titles in our bookstore in an HTML list. We add <ul> and <li> tags to the FLWOR expression: <ul> { for $x in doc("books.xml")/bookstore/book/title order by $x return <li>{$x}</li> } </ul> • The result of the above will be: <ul> <li><title <li><title <li><title <li><title </ul> lang="en">Everyday Italian</title></li> lang="en">Harry Potter</title></li> lang="en">Learning XML</title></li> lang="en">XQuery Kick Start</title></li> Presenting results as HTML • Now we want to eliminate the title element, and show only the data inside the title element: <ul> { for $x in doc("books.xml")/bookstore/book/title order by $x return <li>{data($x)}</li> } </ul> • The result will be (an HTML list): <ul> <li>Everyday Italian</li> <li>Harry Potter</li> <li>Learning XML</li> <li>XQuery Kick Start</li> </ul> Accesing and storing XML Application Program Interface • There are two standard application program interfaces to XML data: – SAX (Simple API for XML) • Based on parser model, user provides event handlers for parsing events – E.g. start of element, end of element – Not suitable for database applications – DOM (Document Object Model) • XML data is parsed into a tree representation • Variety of functions provided for traversing the DOM tree • E.g.: Java DOM API provides Node class with methods getParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), … • Also provides functions for updating DOM tree Database System Concepts - 5th Edition, Aug 22, 2005. 10.140 ©Silberschatz, Korth and Sudarshan Storage of XML Data • XML data can be stored in – Non-relational data stores • Flat files – Natural for storing XML – But has all problems of files (no concurrency, no recovery, …) • XML database – Database built specifically for storing XML data, supporting DOM model and declarative querying – Currently no commercial-grade systems – Relational databases • Data must be translated into relational form • Advantage: mature database systems • Disadvantages: overhead of translating data and queries Database System Concepts - 5th Edition, Aug 22, 2005. 10.141 ©Silberschatz, Korth and Sudarshan Storage of XML in Relational Databases • Alternatives: – String Representation – Tree Representation – Map to relations Database System Concepts - 5th Edition, Aug 22, 2005. 10.142 ©Silberschatz, Korth and Sudarshan String Representation • Store each top level element as a string field of a tuple in a relational database – Use a single relation to store all elements, or – Use a separate relation for each top-level element type • E.g. account, customer, depositor relations – Each with a string-valued attribute to store the element • Indexing: – Store values of subelements/attributes to be indexed as extra fields of the relation, and build indices on these fields • E.g. customer_name or account_number – Some database systems support function indices, which use the result of a function as the key value. • The function should return the value of the required subelement/attribute Database System Concepts - 5th Edition, Aug 22, 2005. 10.143 ©Silberschatz, Korth and Sudarshan String Representation (Cont.) • Benefits: – Can store any XML data even without DTD – As long as there are many top-level elements in a document, strings are small compared to full document • Allows fast access to individual elements. • Drawback: Need to parse strings to access values inside the elements – Parsing is slow. Database System Concepts - 5th Edition, Aug 22, 2005. 10.144 ©Silberschatz, Korth and Sudarshan Tree Representation • Tree representation: model XML data as tree and store using relations nodes(id, type, label, value) bank (id:1) child (child_id, parent_id) customer (id:2) customer_name (id: 3) • • • • • account (id: 5) account_number (id: 7) Each element/attribute is given a unique identifier Type indicates element/attribute Label specifies the tag name of the element/name of attribute Value is the text value of the element/attribute The relation child notes the parent-child relationships in the tree – Can add an extra attribute to child to record ordering of children Database System Concepts - 5th Edition, Aug 22, 2005. 10.145 ©Silberschatz, Korth and Sudarshan Tree Representation (Cont.) • Benefit: Can store any XML data, even without DTD • Drawbacks: – Data is broken up into too many pieces, increasing space overheads – Even simple queries require a large number of joins, which can be slow Database System Concepts - 5th Edition, Aug 22, 2005. 10.146 ©Silberschatz, Korth and Sudarshan Mapping XML Data to Relations • Relation created for each element type whose schema is known: – An id attribute to store a unique id for each element – A relation attribute corresponding to each element attribute – A parent_id attribute to keep track of parent element • As in the tree representation • Position information (ith child) can be store too • All subelements that occur only once can become relation attributes – For text-valued subelements, store the text as attribute value – For complex subelements, can store the id of the subelement • Subelements that can occur multiple times represented in a separate table – Similar to handling of multivalued attributes when converting ER diagrams to tables Database System Concepts - 5th Edition, Aug 22, 2005. 10.147 ©Silberschatz, Korth and Sudarshan Storing XML Data in Relational Systems • Publishing: process of converting relational data to an XML format • Shredding: process of converting an XML document into a set of tuples to be inserted into one or more relations • XML-enabled database systems support automated publishing and shredding • Some systems offer native storage of XML data using the xml data type. Special internal data structures and indices are used for efficiency Database System Concepts - 5th Edition, Aug 22, 2005. 10.148 ©Silberschatz, Korth and Sudarshan SQL/XML • New standard SQL extension that allows creation of nested XML output – Each output tuple is mapped to an XML element row <bank> <account> <row> <account_number> A-101 </account_number> <branch_name> Downtown </branch_name> <balance> 500 </balance> </row> …. more rows if there are more output tuples … </account> </bank> Database System Concepts - 5th Edition, Aug 22, 2005. 10.149 ©Silberschatz, Korth and Sudarshan SQL Extensions • xmlelement creates XML elements • xmlattributes creates attributes select xmlelement (name “account, xmlattributes (account_number as account_number), xmlelement (name “branch_name”, branch_name), xmlelement (name “balance”, balance)) from account Database System Concepts - 5th Edition, Aug 22, 2005. 10.150 ©Silberschatz, Korth and Sudarshan Ernestina Menasalvas Facultad de Informática. Universidad Politécnica de Madrid [email protected]
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            