Download Document

Document related concepts

Resource Description Framework wikipedia , lookup

Embodied language processing wikipedia , lookup

Personal knowledge base wikipedia , lookup

World Wide Web wikipedia , lookup

Metadata wikipedia , lookup

Latent semantic analysis wikipedia , lookup

Knowledge representation and reasoning wikipedia , lookup

Semantic Web wikipedia , lookup

Transcript
Introduction to Semantic Metadata &
Semantic Web
Structured Web
Documents in XML
Lecture Outline
1.
2.
3.
4.
5.
2
HTML vs. XML
Detailed Description of XML
Structuring: DTD, XML Schema
XML Namespaces
Navigating XML documents: XPath
Semantic Metadata & Semantic Web
HyperText Markup Language (HTML)
vs.
eXtensible Markup Language (XML)
3
Semantic Metadata & Semantic Web
HTML
4
Semantic Metadata & Semantic Web
The Same Example in XML
XML is for
struching/publis
hing data for
machines
5
Semantic Metadata & Semantic Web
HTML versus XML: Similarities



Both use tags (e.g. <h2> and <year>)
Tags may be nested (tags within tags)
Human users can read and interpret both HTML and
XML representations quite easily
… But how about machines?
7
Semantic Metadata & Semantic Web
Problems with Automated
Interpretation of HTML Documents
An intelligent agent trying to retrieve the names of the
authors of the book

Authors’ names could appear immediately after the
title or immediately after the word by

8
Are there three authors?
Semantic Metadata & Semantic Web
HTML vs XML: Structural Information (1)


9
HTML documents do not contain structural
information about content: pieces of the document
and their relationships.
XML more easily accessible to machines because
– Every piece of information is described.
– Relations are also defined through the nesting
structure.
– E.g., the <author> tags appear within the <book>
tags, so they describe properties of the particular
book.
Semantic Metadata & Semantic Web
HTML vs XML: Structural Information (2)
10

A machine processing the XML document would be
able to deduce that
– the author element refers to the enclosing book
element

XML allows the definition of constraints on values
– E.g. a year must be a number of four digits
Semantic Metadata & Semantic Web
HTML vs XML: Formatting
11

The HTML representation provides more than the
XML representation:
– The formatting of the document is also described
 But a weakness of HTML

XML: separation of content from display
– same information can be displayed in different
ways (using XSLT style sheets)
Semantic Metadata & Semantic Web
HTML vs XML: Another Example

In HTML
<h2>Relationship force-mass</h2>
<i> F = M A </i>

In XML
<equation>
<description>Relationship forcemass</description>
<leftside> F </leftside>
<rightside> M A </rightside>
</equation>
12
Semantic Metadata & Semantic Web
HTML vs XML: Different Use of Tags

In both HTML same tags
–

XML
–
–
13
HTML tags define display: color, lists …
XML meta markup language for defining markup
languages
user definable tags
Semantic Metadata & Semantic Web
Lecture Outline
1.
2.
3.
4.
5.
14
HTML vs. XML
Detailed Description of XML
Structuring: DTD, XML Schema
Namespaces
Navigating XML documents: XPath
Semantic Metadata & Semantic Web
XML Elements

The “things” the XML document talks about
–

E.g. books, authors, publishers
An element consists of:
–
–
–
an opening tag
the content
a closing tag
<lecturer>David Billington</lecturer>
15
Semantic Metadata & Semantic Web
XML Elements (2)



Tag names can be chosen almost freely.
The first character must be a letter, an
underscore, or a colon
No name may begin with the string “xml” in
any combination of cases
–
16
E.g. “Xml”, “xML”
Semantic Metadata & Semantic Web
Content of XML Elements

Content may be text, or other elements, or nothing
<lecturer>
<name>David Billington</name>
<phone> +61 − 7 − 3875 507 </phone>
</lecturer>

17
If there is no content, then the element is called
empty; it is abbreviated as follows:
<lecturer/> for <lecturer></lecturer>
Semantic Metadata & Semantic Web
XML Attributes

An empty element
meaningless
–

18
is
not
necessarily
It may have some properties in terms of attributes
An attribute is a name-value pair inside the
opening tag of an element
<lecturer
name="David
Billington"
phone="+61 − 7 − 3875 507"/>
Semantic Metadata & Semantic Web
XML Attributes: An Example
<order orderNo="23456" customer="John Smith"
date="October 15, 2002">
<item itemNo="a528" quantity="1"/>
<item itemNo="c817" quantity="3"/>
</order>
19
Semantic Metadata & Semantic Web
The Same Example without Attributes
<order>
<orderNo>23456</orderNo>
<customer>John Smith</customer>
<date>October 15, 2002</date>
<item>
<itemNo>a528</itemNo>
<quantity>1</quantity>
</item>
<item>
<itemNo>c817</itemNo>
<quantity>3</quantity>
</item>
</order>
20
Semantic Metadata & Semantic Web
XML Elements vs Attributes

Attributes can be replaced by elements

When to use elements and when attributes is
a matter of design taste

21
But attributes cannot be nested
Semantic Metadata & Semantic Web
Well-Formed XML Documents


Syntactically correct documents
Some syntactic rules:
–
–
–
Only one outermost element (called root element)
Each element contains an opening and a
corresponding closing tag
Tags may not overlap

–
–
22
<author><name>Lee Hong</author></name>
Attributes within an element have unique names
Element and tag names must be permissible
Semantic Metadata & Semantic Web
The Tree Model of XML Documents:
An Example
<email>
<head>
<from name="Michael Maher"
address="[email protected]"/>
<to name="Grigoris Antoniou"
address="[email protected]"/>
<subject>Where is your draft?</subject>
</head>
<body>
Grigoris, where is the draft of the paper you promised me
last week?
</body>
</email>
23
Semantic Metadata & Semantic Web
The Tree Model of XML Documents:
An Example (2)
24
Semantic Metadata & Semantic Web
Lecture Outline
1.
2.
3.
Introduction
Detailed Description of XML
Structuring
a)
b)
4.
5.
6.
25
DTDs
XML Schema
Namespaces
Navigating XML documents: XPath
Transformations: XSLT
Semantic Metadata & Semantic Web
Structuring XML Documents

Define the structure
–
–
–

26
Define all the element and attribute names that
may be used
what values an attribute may take
which elements may or must occur within other
elements, etc.
If such structuring information exists, the
document can be validated
Semantic Metadata & Semantic Web
Structuring XML Dcuments (2)

An XML document is valid if
–
–

There are two ways of defining the structure
of XML documents:
–
–
27
it is well-formed
respects the structuring information it uses
DTDs (the older and more restricted way)
XML Schema (offers extended possibilities)
Semantic Metadata & Semantic Web
Lecture Outline
1.
2.
3.
Introduction
Detailed Description of XML
Structuring
a)
b)
4.
5.
28
DTDs
XML Schema
Namespaces
Navigating XML documents: xPath
Semantic Metadata & Semantic Web
XML Document Prolog
The declaration header consists of
 an XML declaration and
 A reference to external schema documents

29
DTD can be put in XML document itself
Semantic Metadata & Semantic Web
DTD: Element Type Definition
<lecturer>
<name>David Billington</name>
<phone> +61−7−3875507 </phone>
</lecturer>
DTD for above element (and all lecturer elements)?
<!ELEMENT lecturer (name, phone)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT phone (#PCDATA)>
30
Semantic Metadata & Semantic Web
The Meaning of the DTD




31
The element types lecturer, name, and phone may
be used in the document
A lecturer element contains a name element and a
phone element, in that order (sequence)
A name element and a phone element may have
any content
In DTDs, #PCDATA is the only atomic type for
elements
Semantic Metadata & Semantic Web
DTD: Disjunction in Element Type
Definitions

We express that a lecturer element contains
either a name element or a phone element as
follows:
<!ELEMENT lecturer (name|phone)>

A lecturer element contains a name element
and a phone element in any order.
<!ELEMENT lecturer((name,phone)|(phone,name))>
32
Semantic Metadata & Semantic Web
Example of an XML Element
<order orderNo="23456"
customer="John Smith"
date="October 15, 2002">
<item itemNo="a528" quantity="1"/>
<item itemNo="c817" quantity="3"/>
</order>
33
Semantic Metadata & Semantic Web
The Corresponding DTD
34
<!ELEMENT order (item+)>
<!ATTLIST order orderNo
customer
date
ID
#REQUIRED
CDATA #REQUIRED
CDATA #REQUIRED>
<!ELEMENT item EMPTY>
<!ATTLIST item
itemNo
quantity
comments
ID
#REQUIRED
CDATA #REQUIRED
CDATA #IMPLIED>
Semantic Metadata & Semantic Web
Comments on the DTD


The item element type is defined to be empty
+ (after item) is a cardinality operator:
–
–
–
–
35
?: appears zero times or once
*: appears zero or more times
+: appears one or more times
No cardinality operator means exactly once
Semantic Metadata & Semantic Web
Comments on the DTD (2)


In addition to defining elements, we define
attributes
This is done in an attribute list containing:
–
–

36
Name of the element type to which the list applies
A list of triplets of attribute name, attribute type,
and value type
Attribute name: A name that may be used in
an XML document using a DTD
Semantic Metadata & Semantic Web
DTD: Attribute Types


Similar to predefined data types, but limited selection
The most important types are
–
–
–
–

37
CDATA, a string (sequence of characters)
ID, a name that is unique across the entire XML document
IDREF, a reference to another element with an ID attribute
carrying the same value as the IDREF attribute
IDREFS, a series of IDREFs
Limitations: no dates, number ranges etc.
Semantic Metadata & Semantic Web
Referencing with IDREF and IDREFS
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ATTLIST


38
family (person*)>
person (name)>
name (#PCDATA)>
person
id
ID
mother IDREF
father IDREF
children IDREFS
#REQUIRED
#IMPLIED
#IMPLIED
#IMPLIED>
#REQUIRED: Attribute must appear in every occurrence of the element type in
the XML document
#IMPLIED: The appearance of the attribute is optional
Semantic Metadata & Semantic Web
An XML Document Respecting the DTD
<family>
<person id="kalsoom" mother="khalida" father="yonus">
<name>Kalsoom Yonus</name>
</person>
<person id="ali" mother="khalida" father="yonus">
<name>Muhammad Ali</name>
</person>
<person id="khalida" children="ali kalsoom">
<name>Khalida Yonus</name>
</person>
<person id="yonus" children="ali kalsoom">
<name>Muhammad Yonus</name>
</person>
</family>
39
Semantic Metadata & Semantic Web
XML Entities

An XML entity can play the role as
–
–

40
a placeholder for repeatable characters
a section of external data
We can use the entity reference &thisyear instead of
the value " 2007 "
<!ENTITY thisyear " 2007 " >
Semantic Metadata & Semantic Web
A DTD for an Email Element
<!ELEMENT
<!ELEMENT
<!ELEMENT
<!ATTLIST
email (head,body)>
head (from,to+,cc*,subject)>
from EMPTY>
from
name
CDATA
#IMPLIED
address CDATA
#REQUIRED>
<!ELEMENT to EMPTY>
<!ATTLIST to
name
CDATA
#IMPLIED
address CDATA
#REQUIRED>
41
Semantic Metadata & Semantic Web
A DTD for an Email Element (2)
<!ELEMENT cc EMPTY>
<!ATTLIST cc
name
CDATA
#IMPLIED
address
CDATA
#REQUIRED>
<!ELEMENT subject (#PCDATA)>
<!ELEMENT body (text,attachment*)>
<!ELEMENT text (#PCDATA)>
<!ELEMENT attachment EMPTY>
<!ATTLIST attachment
file
CDATA
#REQUIRED>
42
Semantic Metadata & Semantic Web
Interesting Parts of the DTD

A head element contains:
–
–
–
–

In from, to, and cc elements
–
–

the name attribute is not required
the address attribute is always required
A body element contains
–
–
43
a from element
at least one to element
zero or more cc elements
a subject element
a text element
possibly followed by a number of attachment elements
Semantic Metadata & Semantic Web
Lecture Outline
1.
2.
3.
Introduction
Detailed Description of XML
Structuring
a)
b)
4.
5.
44
DTDs
XML Schema
Namespaces
Navigating XML documents: xPath
Semantic Metadata & Semantic Web
XML Schema

Richer language for structuring of XML documents

Its syntax is based on XML itself

Reuse and refinement of schemas
–

Expand or delete already existent schemas
Sophisticated set of data types, compared to DTDs
(which only supports strings)
45
Semantic Metadata & Semantic Web
XML Schema (2)

XML schema is an element with an opening tag like
<?xml version="1.0"?>
<xs:schema
xmlns:xs="http://www.w3.org/2001/XMLSchema"
...
...
</xs:schema>

46
Schema consists of element and attribute types
Semantic Metadata & Semantic Web
Element Types
<element name="head" type="headType"/>
<element name="to" type="nameAddress"
minOccurs="1" />
Cardinality constraints:

minOccurs="x" (default value 1)

maxOccurs="x" (default value 1)

Generalizations of *, ?, + offered by DTDs
47
Semantic Metadata & Semantic Web
Attribute Types
<attribute name="id" type="ID " use="required"/>
<attribute name="speaks" type="Language"
use="default" value="en"/>


48
Existence: use="x", where x may be optional or
required
Default value: use="x" value="...", where x may be
default or fixed
Semantic Metadata & Semantic Web
Data Types


49
Built-in data types
– Numerical data types: integer, short etc.
– String types: string, ID, IDREF, CDATA etc.
– Date and time data types: time, month etc.
User-defined data types
– simple data types: which cannot use elements or
attributes
– complex data types: which can use elements and
attributes
Semantic Metadata & Semantic Web
Data Types (2)



50
Complex data types are defined from existing data types by
defining some attributes (if any) and using indicators:
Order indicators
–
sequence, a sequence of existing data type elements (order
is important)
–
all, a collection of elements that must appear (order is not
important)
–
choice, a collection of elements, of which one will be
chosen
Occurrence Indicators
–
maxOccurs
–
minOccurs
Semantic Metadata & Semantic Web
A Data Type Example
<complexType name="lecturerType">
<sequence>
<element name="firstname" type="string"
minOccurs="0“ maxOccurs="unbounded"/>
<element name="lastname" type="string"/>
</sequence>
<attribute name="title" type="string"
use="optional"/>
</complexType>
52
Semantic Metadata & Semantic Web
Mixed Content Example
<letter>
Dear Mr.<name>John Smith</name>.
Your order <orderid>1032</orderid>
will be shipped on <shipdate>2001-07-13</shipdate>.
</letter>
<xs:element name="letter">
<xs:complexType mixed="true">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="orderid" type="xs:positiveInteger"/>
<xs:element name="shipdate" type="xs:date"/>
</xs:sequence>
</xs:complexType>
</xs:element>
53
Semantic Metadata & Semantic Web
Simple Data Types
<simpleType name="dayOfMonth">
<restriction base="integer">
<minInclusive value="1"/>
<maxInclusive value="31"/>
</restriction>
</simpleType>
54
Semantic Metadata & Semantic Web
Data Type: Enumeration
<simpleType name="dayOfWeek">
<restriction base="string">
<enumeration value="Mon"/>
<enumeration value="Tue"/>
<enumeration value="Wed"/>
<enumeration value="Thu"/>
<enumeration value="Fri"/>
<enumeration value="Sat"/>
<enumeration value="Sun"/>
</restriction>
</simpleType>
55
Semantic Metadata & Semantic Web
XML Schema: The Email Example
<element name="email" type="emailType"/>
<complexType name="emailType">
<sequence>
<element name="head" type="headType"/>
<element name="body" type="bodyType"/>
</sequence>
</complexType>
56
Semantic Metadata & Semantic Web
XML Schema: The Email Example (2)
<complexType name="headType">
<sequence>
<element name="from" type="nameAddress"/>
<element name="to" type="nameAddress"
minOccurs="1" maxOccurs="unbounded"/>
<element name="cc" type="nameAddress"
minOccurs="0" maxOccurs="unbounded"/>
<element name="subject" type="string"/>
</sequence>
</complexType>
57
Semantic Metadata & Semantic Web
XML Schema: The Email Example (3)
<complexType name="nameAddress">
<attribute name="name" type="string"
use="optional"/>
<attribute name="address"
type="string" use="required"/>
</complexType>

58
Similar for bodyType
Semantic Metadata & Semantic Web
Lecture Outline
1.
2.
3.
4.
5.
59
HTML vs. XML
Detailed Description of XML
Structuring: DTD, XML Schema
Namespaces
Navigating XML documents: XPath
Semantic Metadata & Semantic Web
Namespaces




60
Namespaces allow to uniquely identify XML vocabularies by
using a uniform resource identifier (URI)
Different independent groups can define same objects differently
in their schemas/vocabularies.
–
This may lead to name clashes in XML documents when
using multiple such schemas
A solution to this heterogeneity problem is namespaces
In XML documents, qualified names for elements and attributes
are used
Semantic Metadata & Semantic Web
An Example
<instructors
xmlns="http://www.vu.com/empDTD"
xmlns:gu="http://www.gu.au/empDTD"
xmlns:uky="http://www.uky.edu/empDTD">
<uky:faculty
uky:title="assistant professor"
uky:name="John Smith"
uky:department="Computer Science"/>
<gu:academicStaff
gu:title="lecturer"
gu:name="Mate Jones"
gu:school="Information Technology"/>
</instructors>
61
Semantic Metadata & Semantic Web
Namespace Declarations




62
This way, an XML document may use more than one DTD or
schema, each having a different prefix
Namespaces are declared within an element and can be used
in that element and any of its children (elements and attributes)
A namespace declaration has the form:
–
xmlns:prefix="location"
–
location is the address of the DTD or schema
If a prefix is not specified: xmlns="location" then the location
is used by default
Semantic Metadata & Semantic Web
XML Vocabularies/Applications


63
Web applications must agree on common
vocabularies to communicate and collaborate
Communities and business sectors are defining their
specialized vocabularies
– XHTML
– Dublin Core (DC)
– mathematics (MathML)
– bioinformatics (BSML)
– …
Semantic Metadata & Semantic Web
Lecture Outline
1.
2.
3.
4.
5.
64
Introduction
Detailed Description of XML
Structuring: XML Schema
Namespaces
Navigating XML documents: Xpath; XQuery
Semantic Metadata & Semantic Web
Addressing and Querying XML
Documents
65

In relational databases, parts of a database can be
selected and retrieved using SQL
– Same necessary for XML documents
– Query languages: XQuery, XQL, XML-QL

The central concept of XML query languages is a
path expression
– Specifies how a node or a set of nodes, in the
tree representation of the XML document can be
reached
Semantic Metadata & Semantic Web
XPath
66

XPath is core for XML query languages

Language for addressing parts of an XML document.
–
It operates on the tree data model of XML
–
It has a non-XML syntax
Semantic Metadata & Semantic Web
Tree Structure of an XML Document






67
The root node
Element nodes
Text nodes
Attribute nodes
Comment nodes
…
Semantic Metadata & Semantic Web
An XML Example
<library location="Bremen">
<author name="Henry Wise">
<book title="Artificial Intelligence"/>
<book title="Modern Web Services"/>
<book title="Theory of Computation"/>
</author>
<author name="William Smart">
<book title="Artificial Intelligence" price="30" />
</author>
<author name="Cynthia Singleton">
<book title="The Semantic Web" price= "40.99" />
<book title="Browser Technology Revised"/>
</author>
</library>
68
Semantic Metadata & Semantic Web
Tree Representation
69
Semantic Metadata & Semantic Web
Examples of Path Expressions in
XPath

Address all author elements
Absolute
Path
/library/author

Addresses all author elements that are children of
the library element node
70
Semantic Metadata & Semantic Web
Examples of Path Expressions in
XPath (2)

Address all author elements
//author

This path expression addresses all author elements
anywhere in the document
71
Semantic Metadata & Semantic Web
Examples of Path Expressions in
XPath (3)

Address the location attribute nodes within library
element nodes
/library/@location

72
The symbol @ is used to denote attribute nodes
Semantic Metadata & Semantic Web
Examples of Path Expressions in
XPath (4)

Address all books with title “Artificial Intelligence”
//book[@title="Artificial Intelligence"]

Test within square brackets: a filter expression
–
It restricts the set of addressed nodes.
–
Query 4 addresses book elements, the title of
which satisfies a certain condition.
73
Semantic Metadata & Semantic Web
Examples of Filter Expressions
(Predicates)
74

Address the first author element node in the XML document
//author[1]

Address the title of last book element within the first author element
node in the document
//author[1]/book[last()]/@title

Address the title of all book elements having price greater than 30
//book[@price>30]/@title

Address the title of all book elements having no price
//book[not(@price)]/@title
Semantic Metadata & Semantic Web
Few more Queries


75
Select the titles of books having “Modern” in title
//book[contains(@title, 'Modern')]/@title
Selecting several paths: gives the name of all authors and the titles
of their books
//author/@name | //book/@title

Returns the attribute values of title and price of all book elements
//book/@title | //book/@price

Path expression with wildcard: selects all the elements in the
document
//*
Semantic Metadata & Semantic Web
XQuery
Do yourself 
76
Semantic Metadata & Semantic Web
Review





77
XML is a meta-language that allows users to define markup.
Nesting of tags introduces structure. The structure of
documents can be enforced using XML schemas or DTDs.
XML separates content and structure from formatting.
XML supports the exchange of structured information
across different applications through markup, structure, and
transformations.
XML is supported by query languages
Semantic Metadata & Semantic Web
Literature

For XML, XML Schema, xpath, xQuery
–
–
78
Book: XML Databases and the Semantic Web, ch
8
“EditiX - XML basics_xPath_xQuery.pdf”
Semantic Metadata & Semantic Web