Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Semantic Metadata & Semantic Web Structured Web Documents in XML Lecture Outline 1. 2. 3. 4. 5. 2 HTML vs. XML Detailed Description of XML Structuring: DTD, XML Schema XML Namespaces Navigating XML documents: XPath Semantic Metadata & Semantic Web HyperText Markup Language (HTML) vs. eXtensible Markup Language (XML) 3 Semantic Metadata & Semantic Web HTML 4 Semantic Metadata & Semantic Web The Same Example in XML XML is for struching/publis hing data for machines 5 Semantic Metadata & Semantic Web HTML versus XML: Similarities Both use tags (e.g. <h2> and <year>) Tags may be nested (tags within tags) Human users can read and interpret both HTML and XML representations quite easily … But how about machines? 7 Semantic Metadata & Semantic Web Problems with Automated Interpretation of HTML Documents An intelligent agent trying to retrieve the names of the authors of the book Authors’ names could appear immediately after the title or immediately after the word by 8 Are there three authors? Semantic Metadata & Semantic Web HTML vs XML: Structural Information (1) 9 HTML documents do not contain structural information about content: pieces of the document and their relationships. XML more easily accessible to machines because – Every piece of information is described. – Relations are also defined through the nesting structure. – E.g., the <author> tags appear within the <book> tags, so they describe properties of the particular book. Semantic Metadata & Semantic Web HTML vs XML: Structural Information (2) 10 A machine processing the XML document would be able to deduce that – the author element refers to the enclosing book element XML allows the definition of constraints on values – E.g. a year must be a number of four digits Semantic Metadata & Semantic Web HTML vs XML: Formatting 11 The HTML representation provides more than the XML representation: – The formatting of the document is also described But a weakness of HTML XML: separation of content from display – same information can be displayed in different ways (using XSLT style sheets) Semantic Metadata & Semantic Web HTML vs XML: Another Example In HTML <h2>Relationship force-mass</h2> <i> F = M A </i> In XML <equation> <description>Relationship forcemass</description> <leftside> F </leftside> <rightside> M A </rightside> </equation> 12 Semantic Metadata & Semantic Web HTML vs XML: Different Use of Tags In both HTML same tags – XML – – 13 HTML tags define display: color, lists … XML meta markup language for defining markup languages user definable tags Semantic Metadata & Semantic Web Lecture Outline 1. 2. 3. 4. 5. 14 HTML vs. XML Detailed Description of XML Structuring: DTD, XML Schema Namespaces Navigating XML documents: XPath Semantic Metadata & Semantic Web XML Elements The “things” the XML document talks about – E.g. books, authors, publishers An element consists of: – – – an opening tag the content a closing tag <lecturer>David Billington</lecturer> 15 Semantic Metadata & Semantic Web XML Elements (2) Tag names can be chosen almost freely. The first character must be a letter, an underscore, or a colon No name may begin with the string “xml” in any combination of cases – 16 E.g. “Xml”, “xML” Semantic Metadata & Semantic Web Content of XML Elements Content may be text, or other elements, or nothing <lecturer> <name>David Billington</name> <phone> +61 − 7 − 3875 507 </phone> </lecturer> 17 If there is no content, then the element is called empty; it is abbreviated as follows: <lecturer/> for <lecturer></lecturer> Semantic Metadata & Semantic Web XML Attributes An empty element meaningless – 18 is not necessarily It may have some properties in terms of attributes An attribute is a name-value pair inside the opening tag of an element <lecturer name="David Billington" phone="+61 − 7 − 3875 507"/> Semantic Metadata & Semantic Web XML Attributes: An Example <order orderNo="23456" customer="John Smith" date="October 15, 2002"> <item itemNo="a528" quantity="1"/> <item itemNo="c817" quantity="3"/> </order> 19 Semantic Metadata & Semantic Web The Same Example without Attributes <order> <orderNo>23456</orderNo> <customer>John Smith</customer> <date>October 15, 2002</date> <item> <itemNo>a528</itemNo> <quantity>1</quantity> </item> <item> <itemNo>c817</itemNo> <quantity>3</quantity> </item> </order> 20 Semantic Metadata & Semantic Web XML Elements vs Attributes Attributes can be replaced by elements When to use elements and when attributes is a matter of design taste 21 But attributes cannot be nested Semantic Metadata & Semantic Web Well-Formed XML Documents Syntactically correct documents Some syntactic rules: – – – Only one outermost element (called root element) Each element contains an opening and a corresponding closing tag Tags may not overlap – – 22 <author><name>Lee Hong</author></name> Attributes within an element have unique names Element and tag names must be permissible Semantic Metadata & Semantic Web The Tree Model of XML Documents: An Example <email> <head> <from name="Michael Maher" address="[email protected]"/> <to name="Grigoris Antoniou" address="[email protected]"/> <subject>Where is your draft?</subject> </head> <body> Grigoris, where is the draft of the paper you promised me last week? </body> </email> 23 Semantic Metadata & Semantic Web The Tree Model of XML Documents: An Example (2) 24 Semantic Metadata & Semantic Web Lecture Outline 1. 2. 3. Introduction Detailed Description of XML Structuring a) b) 4. 5. 6. 25 DTDs XML Schema Namespaces Navigating XML documents: XPath Transformations: XSLT Semantic Metadata & Semantic Web Structuring XML Documents Define the structure – – – 26 Define all the element and attribute names that may be used what values an attribute may take which elements may or must occur within other elements, etc. If such structuring information exists, the document can be validated Semantic Metadata & Semantic Web Structuring XML Dcuments (2) An XML document is valid if – – There are two ways of defining the structure of XML documents: – – 27 it is well-formed respects the structuring information it uses DTDs (the older and more restricted way) XML Schema (offers extended possibilities) Semantic Metadata & Semantic Web Lecture Outline 1. 2. 3. Introduction Detailed Description of XML Structuring a) b) 4. 5. 28 DTDs XML Schema Namespaces Navigating XML documents: xPath Semantic Metadata & Semantic Web XML Document Prolog The declaration header consists of an XML declaration and A reference to external schema documents 29 DTD can be put in XML document itself Semantic Metadata & Semantic Web DTD: Element Type Definition <lecturer> <name>David Billington</name> <phone> +61−7−3875507 </phone> </lecturer> DTD for above element (and all lecturer elements)? <!ELEMENT lecturer (name, phone)> <!ELEMENT name (#PCDATA)> <!ELEMENT phone (#PCDATA)> 30 Semantic Metadata & Semantic Web The Meaning of the DTD 31 The element types lecturer, name, and phone may be used in the document A lecturer element contains a name element and a phone element, in that order (sequence) A name element and a phone element may have any content In DTDs, #PCDATA is the only atomic type for elements Semantic Metadata & Semantic Web DTD: Disjunction in Element Type Definitions We express that a lecturer element contains either a name element or a phone element as follows: <!ELEMENT lecturer (name|phone)> A lecturer element contains a name element and a phone element in any order. <!ELEMENT lecturer((name,phone)|(phone,name))> 32 Semantic Metadata & Semantic Web Example of an XML Element <order orderNo="23456" customer="John Smith" date="October 15, 2002"> <item itemNo="a528" quantity="1"/> <item itemNo="c817" quantity="3"/> </order> 33 Semantic Metadata & Semantic Web The Corresponding DTD 34 <!ELEMENT order (item+)> <!ATTLIST order orderNo customer date ID #REQUIRED CDATA #REQUIRED CDATA #REQUIRED> <!ELEMENT item EMPTY> <!ATTLIST item itemNo quantity comments ID #REQUIRED CDATA #REQUIRED CDATA #IMPLIED> Semantic Metadata & Semantic Web Comments on the DTD The item element type is defined to be empty + (after item) is a cardinality operator: – – – – 35 ?: appears zero times or once *: appears zero or more times +: appears one or more times No cardinality operator means exactly once Semantic Metadata & Semantic Web Comments on the DTD (2) In addition to defining elements, we define attributes This is done in an attribute list containing: – – 36 Name of the element type to which the list applies A list of triplets of attribute name, attribute type, and value type Attribute name: A name that may be used in an XML document using a DTD Semantic Metadata & Semantic Web DTD: Attribute Types Similar to predefined data types, but limited selection The most important types are – – – – 37 CDATA, a string (sequence of characters) ID, a name that is unique across the entire XML document IDREF, a reference to another element with an ID attribute carrying the same value as the IDREF attribute IDREFS, a series of IDREFs Limitations: no dates, number ranges etc. Semantic Metadata & Semantic Web Referencing with IDREF and IDREFS <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST 38 family (person*)> person (name)> name (#PCDATA)> person id ID mother IDREF father IDREF children IDREFS #REQUIRED #IMPLIED #IMPLIED #IMPLIED> #REQUIRED: Attribute must appear in every occurrence of the element type in the XML document #IMPLIED: The appearance of the attribute is optional Semantic Metadata & Semantic Web An XML Document Respecting the DTD <family> <person id="kalsoom" mother="khalida" father="yonus"> <name>Kalsoom Yonus</name> </person> <person id="ali" mother="khalida" father="yonus"> <name>Muhammad Ali</name> </person> <person id="khalida" children="ali kalsoom"> <name>Khalida Yonus</name> </person> <person id="yonus" children="ali kalsoom"> <name>Muhammad Yonus</name> </person> </family> 39 Semantic Metadata & Semantic Web XML Entities An XML entity can play the role as – – 40 a placeholder for repeatable characters a section of external data We can use the entity reference &thisyear instead of the value " 2007 " <!ENTITY thisyear " 2007 " > Semantic Metadata & Semantic Web A DTD for an Email Element <!ELEMENT <!ELEMENT <!ELEMENT <!ATTLIST email (head,body)> head (from,to+,cc*,subject)> from EMPTY> from name CDATA #IMPLIED address CDATA #REQUIRED> <!ELEMENT to EMPTY> <!ATTLIST to name CDATA #IMPLIED address CDATA #REQUIRED> 41 Semantic Metadata & Semantic Web A DTD for an Email Element (2) <!ELEMENT cc EMPTY> <!ATTLIST cc name CDATA #IMPLIED address CDATA #REQUIRED> <!ELEMENT subject (#PCDATA)> <!ELEMENT body (text,attachment*)> <!ELEMENT text (#PCDATA)> <!ELEMENT attachment EMPTY> <!ATTLIST attachment file CDATA #REQUIRED> 42 Semantic Metadata & Semantic Web Interesting Parts of the DTD A head element contains: – – – – In from, to, and cc elements – – the name attribute is not required the address attribute is always required A body element contains – – 43 a from element at least one to element zero or more cc elements a subject element a text element possibly followed by a number of attachment elements Semantic Metadata & Semantic Web Lecture Outline 1. 2. 3. Introduction Detailed Description of XML Structuring a) b) 4. 5. 44 DTDs XML Schema Namespaces Navigating XML documents: xPath Semantic Metadata & Semantic Web XML Schema Richer language for structuring of XML documents Its syntax is based on XML itself Reuse and refinement of schemas – Expand or delete already existent schemas Sophisticated set of data types, compared to DTDs (which only supports strings) 45 Semantic Metadata & Semantic Web XML Schema (2) XML schema is an element with an opening tag like <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" ... ... </xs:schema> 46 Schema consists of element and attribute types Semantic Metadata & Semantic Web Element Types <element name="head" type="headType"/> <element name="to" type="nameAddress" minOccurs="1" /> Cardinality constraints: minOccurs="x" (default value 1) maxOccurs="x" (default value 1) Generalizations of *, ?, + offered by DTDs 47 Semantic Metadata & Semantic Web Attribute Types <attribute name="id" type="ID " use="required"/> <attribute name="speaks" type="Language" use="default" value="en"/> 48 Existence: use="x", where x may be optional or required Default value: use="x" value="...", where x may be default or fixed Semantic Metadata & Semantic Web Data Types 49 Built-in data types – Numerical data types: integer, short etc. – String types: string, ID, IDREF, CDATA etc. – Date and time data types: time, month etc. User-defined data types – simple data types: which cannot use elements or attributes – complex data types: which can use elements and attributes Semantic Metadata & Semantic Web Data Types (2) 50 Complex data types are defined from existing data types by defining some attributes (if any) and using indicators: Order indicators – sequence, a sequence of existing data type elements (order is important) – all, a collection of elements that must appear (order is not important) – choice, a collection of elements, of which one will be chosen Occurrence Indicators – maxOccurs – minOccurs Semantic Metadata & Semantic Web A Data Type Example <complexType name="lecturerType"> <sequence> <element name="firstname" type="string" minOccurs="0“ maxOccurs="unbounded"/> <element name="lastname" type="string"/> </sequence> <attribute name="title" type="string" use="optional"/> </complexType> 52 Semantic Metadata & Semantic Web Mixed Content Example <letter> Dear Mr.<name>John Smith</name>. Your order <orderid>1032</orderid> will be shipped on <shipdate>2001-07-13</shipdate>. </letter> <xs:element name="letter"> <xs:complexType mixed="true"> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="orderid" type="xs:positiveInteger"/> <xs:element name="shipdate" type="xs:date"/> </xs:sequence> </xs:complexType> </xs:element> 53 Semantic Metadata & Semantic Web Simple Data Types <simpleType name="dayOfMonth"> <restriction base="integer"> <minInclusive value="1"/> <maxInclusive value="31"/> </restriction> </simpleType> 54 Semantic Metadata & Semantic Web Data Type: Enumeration <simpleType name="dayOfWeek"> <restriction base="string"> <enumeration value="Mon"/> <enumeration value="Tue"/> <enumeration value="Wed"/> <enumeration value="Thu"/> <enumeration value="Fri"/> <enumeration value="Sat"/> <enumeration value="Sun"/> </restriction> </simpleType> 55 Semantic Metadata & Semantic Web XML Schema: The Email Example <element name="email" type="emailType"/> <complexType name="emailType"> <sequence> <element name="head" type="headType"/> <element name="body" type="bodyType"/> </sequence> </complexType> 56 Semantic Metadata & Semantic Web XML Schema: The Email Example (2) <complexType name="headType"> <sequence> <element name="from" type="nameAddress"/> <element name="to" type="nameAddress" minOccurs="1" maxOccurs="unbounded"/> <element name="cc" type="nameAddress" minOccurs="0" maxOccurs="unbounded"/> <element name="subject" type="string"/> </sequence> </complexType> 57 Semantic Metadata & Semantic Web XML Schema: The Email Example (3) <complexType name="nameAddress"> <attribute name="name" type="string" use="optional"/> <attribute name="address" type="string" use="required"/> </complexType> 58 Similar for bodyType Semantic Metadata & Semantic Web Lecture Outline 1. 2. 3. 4. 5. 59 HTML vs. XML Detailed Description of XML Structuring: DTD, XML Schema Namespaces Navigating XML documents: XPath Semantic Metadata & Semantic Web Namespaces 60 Namespaces allow to uniquely identify XML vocabularies by using a uniform resource identifier (URI) Different independent groups can define same objects differently in their schemas/vocabularies. – This may lead to name clashes in XML documents when using multiple such schemas A solution to this heterogeneity problem is namespaces In XML documents, qualified names for elements and attributes are used Semantic Metadata & Semantic Web An Example <instructors xmlns="http://www.vu.com/empDTD" xmlns:gu="http://www.gu.au/empDTD" xmlns:uky="http://www.uky.edu/empDTD"> <uky:faculty uky:title="assistant professor" uky:name="John Smith" uky:department="Computer Science"/> <gu:academicStaff gu:title="lecturer" gu:name="Mate Jones" gu:school="Information Technology"/> </instructors> 61 Semantic Metadata & Semantic Web Namespace Declarations 62 This way, an XML document may use more than one DTD or schema, each having a different prefix Namespaces are declared within an element and can be used in that element and any of its children (elements and attributes) A namespace declaration has the form: – xmlns:prefix="location" – location is the address of the DTD or schema If a prefix is not specified: xmlns="location" then the location is used by default Semantic Metadata & Semantic Web XML Vocabularies/Applications 63 Web applications must agree on common vocabularies to communicate and collaborate Communities and business sectors are defining their specialized vocabularies – XHTML – Dublin Core (DC) – mathematics (MathML) – bioinformatics (BSML) – … Semantic Metadata & Semantic Web Lecture Outline 1. 2. 3. 4. 5. 64 Introduction Detailed Description of XML Structuring: XML Schema Namespaces Navigating XML documents: Xpath; XQuery Semantic Metadata & Semantic Web Addressing and Querying XML Documents 65 In relational databases, parts of a database can be selected and retrieved using SQL – Same necessary for XML documents – Query languages: XQuery, XQL, XML-QL The central concept of XML query languages is a path expression – Specifies how a node or a set of nodes, in the tree representation of the XML document can be reached Semantic Metadata & Semantic Web XPath 66 XPath is core for XML query languages Language for addressing parts of an XML document. – It operates on the tree data model of XML – It has a non-XML syntax Semantic Metadata & Semantic Web Tree Structure of an XML Document 67 The root node Element nodes Text nodes Attribute nodes Comment nodes … Semantic Metadata & Semantic Web An XML Example <library location="Bremen"> <author name="Henry Wise"> <book title="Artificial Intelligence"/> <book title="Modern Web Services"/> <book title="Theory of Computation"/> </author> <author name="William Smart"> <book title="Artificial Intelligence" price="30" /> </author> <author name="Cynthia Singleton"> <book title="The Semantic Web" price= "40.99" /> <book title="Browser Technology Revised"/> </author> </library> 68 Semantic Metadata & Semantic Web Tree Representation 69 Semantic Metadata & Semantic Web Examples of Path Expressions in XPath Address all author elements Absolute Path /library/author Addresses all author elements that are children of the library element node 70 Semantic Metadata & Semantic Web Examples of Path Expressions in XPath (2) Address all author elements //author This path expression addresses all author elements anywhere in the document 71 Semantic Metadata & Semantic Web Examples of Path Expressions in XPath (3) Address the location attribute nodes within library element nodes /library/@location 72 The symbol @ is used to denote attribute nodes Semantic Metadata & Semantic Web Examples of Path Expressions in XPath (4) Address all books with title “Artificial Intelligence” //book[@title="Artificial Intelligence"] Test within square brackets: a filter expression – It restricts the set of addressed nodes. – Query 4 addresses book elements, the title of which satisfies a certain condition. 73 Semantic Metadata & Semantic Web Examples of Filter Expressions (Predicates) 74 Address the first author element node in the XML document //author[1] Address the title of last book element within the first author element node in the document //author[1]/book[last()]/@title Address the title of all book elements having price greater than 30 //book[@price>30]/@title Address the title of all book elements having no price //book[not(@price)]/@title Semantic Metadata & Semantic Web Few more Queries 75 Select the titles of books having “Modern” in title //book[contains(@title, 'Modern')]/@title Selecting several paths: gives the name of all authors and the titles of their books //author/@name | //book/@title Returns the attribute values of title and price of all book elements //book/@title | //book/@price Path expression with wildcard: selects all the elements in the document //* Semantic Metadata & Semantic Web XQuery Do yourself 76 Semantic Metadata & Semantic Web Review 77 XML is a meta-language that allows users to define markup. Nesting of tags introduces structure. The structure of documents can be enforced using XML schemas or DTDs. XML separates content and structure from formatting. XML supports the exchange of structured information across different applications through markup, structure, and transformations. XML is supported by query languages Semantic Metadata & Semantic Web Literature For XML, XML Schema, xpath, xQuery – – 78 Book: XML Databases and the Semantic Web, ch 8 “EditiX - XML basics_xPath_xQuery.pdf” Semantic Metadata & Semantic Web