Download XML - Simon Fraser University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Open Database Connectivity wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational algebra wikipedia , lookup

IMDb wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
XML and Web Data
Data in HTML
• HyperText Markup
Language
– Different data elements
are set out using tags
• No schema?
– Based on the data
itself, we can make a
reasonable guess about
the structure
– “Self-describing”
CMPT 354: Database I -- XML
2
Object and Schema
CMPT 354: Database I -- XML
3
Semi-structured Data
• Object-like: it can be represented as a
collection of objects
• Schemaless: it is not guaranteed to conform
to any type structure
• Self-describing
– Often carries only the names of the attributes
and has a lower degree of organization than the
data in the database
• Semi-structured data: data with the above
characteristics
CMPT 354: Database I -- XML
4
Schemaless But Self-Describing
(#12345,
[ListName:“Students”,
Contents:{ [Name:“John Doe”,
ID:“111111111”,
Address:[Number:123, Street:“Main St”] ],
[Name:“Joe Public”,
Id:“666666666”,
Address:[Number:666, Street:“Hollow Rd”] ]}
])
CMPT 354: Database I -- XML
5
XML
• Extensible Markup Language
– A standard adopted in 1998 by the W3C (World Wide
Web Consortium)
• Optional mechanisms for specifying document
structure
– DTD: the Document Type Definition Language, part of
the XML standard
– XML Schema: a more recent specification built on top of
XML
• Query languages for XML
– XPath: lightweight
– XSLIT: document transformation language
– XQuery: a full-blown language
CMPT 354: Database I -- XML
6
From HTML to XML
CMPT 354: Database I -- XML
7
HTML and XML
• HTML
– A fixed number of tags
– Each tag has its own well-defined meaning
• E.g., <table> … </table>
• XML: HTML-like language
– An arbitrary number of user-defined tags
– No a priori semantics
– Mainly for data exchange
– Display using stylesheet
CMPT 354: Database I -- XML
8
Important Differences
• XML contains a large assortment of tags chosen
by the document author
– The only valid tags in HTML are those sanctioned by
the official specification of the language; other tags are
ignored
• Every opening tag must have a matching closing
tag, and the tags must be properly nested
– E.g., <a><b></a></b> is not allowed
– Some HTML tags are not required to be closed, e.g.,
<p>
• The document has a root element – the element
that contains all other elements
CMPT 354: Database I -- XML
9
Example
Mandatory statement
Root element
XML elements
Element names
Element contents
CMPT 354: Database I -- XML
10
Hierarchical Structure
PersonList Student
Title
Contents
Person
Person
Name: John Doe
Name: Joe Public
Id: 111111111
Id: 666666666
Address
Address
Number: 123
Number: 666
Street: Main St
Street: Hollow Rd
CMPT 354: Database I -- XML
11
Attributes
• <PersonList Type=“Student”>
– Type is the name of an attribute that belongs to the
element PersonList
– Student is the attribute value
– All attribute values must be quoted
– Text strings between tags do not need to be quoted
• Empty element
– <Title Value=“Student List”/>
– The element has one attribute and no content
– A shorthand for <Title Value=“Student List”></Title>
CMPT 354: Database I -- XML
12
Processing Instructions & Comments
• Processing instructions
– <?xml version=“1.0” ?>
– Contain anything the author might want to communicate
to the XML processor, e.g., <?my-command go bring
coffee?>
– Rarely used
• Comment
– <!-- A comment -->
– Can occur everywhere except inside the markups, i.e.,
between symbols < and >
– An integral part of the document
– May be used by a receiver (e.g., a browser)
CMPT 354: Database I -- XML
13
CDATA Construct
• Include strings of characters which contain
markup elements that might make the
document ill formed
• <![CDATA[ This is an example of markup in
HTML: <b><i> Example <\b><\i>]]>
CMPT 354: Database I -- XML
14
XML Elements and Data Objects
• XML allows mixed data/text structure
• XML elements are ordered
• XML has only one primitive type, string, and
very weak facilities for specifying constraints
A legal XML document
<Address>
Sally lives on
<Street> Main St </Street>
house number
<Number> 123 </Number>
in the beautiful Anytown, Canada.
</Address>
CMPT 354: Database I -- XML
<Address>
<Number> 123 </Number>
<Street> Main St </Street>
</Address>
is different from
<Address>
<Street> Main St </Street>
<Number> 123 </Number>
</Address>
15
Use of Attributes
• An element can have any number of user-defined
attributes
• What attributes can do can also be achieved with elements
– An attribute may occur only once within a tag, while subelements
with the same tag may be repeated
• Attributes introduce ambiguity as to whether to represent
information as attributes or elements
– Sometimes convenient for representing data, can also be done with
elements
– The use of attributes is expected to decline
<Address>
<Number> 123 </Number>
<Street> Main St </Street>
</Address>
CMPT 354: Database I -- XML
<Address Number=“123” Street=“Main St/>
16
Attributes in Markup
<Act Number=“5”>
<Scene Number=“1” Place=“Mantua. A street”>
…
<Apothecary Voice=“scared”>
Such mortal drugs I have; but Mantua’s law
Is death to any he that utters them.
</Apothecary>
<Romeo Voice=“persistent”>
Art thou so bare and full of wretchedness,
And fear’st to die?
…
</Romeo>
…
</Scene>
</Act>
CMPT 354: Database I -- XML
17
Advantages of Attributes
• Attributes in an element are not ordered
– <Address Number=“123” Street=“Main St”/>
– <Address Street=“Main St” Number=“123”/>
• Attributes are more succinct
• Attributes can be declared to have unique value
and can be used to enforce limited kind of
referential integrity
<Address>
<Number> 123 </Number>
<Street> Main St </Street>
</Address>
CMPT 354: Database I -- XML
18
ID and IDREF – Cross-References
CMPT 354: Database I -- XML
19
Well Formed XML Document
• It has a root element
• Every opening tag is followed by a matching
closing tag, and the elements are properly
nested inside each other
• Any attribute can occur at most once in a
given opening tag, its value must be
provided, and this value must be quoted
CMPT 354: Database I -- XML
20
Namespaces
• A term (tag) might have different meanings in
different contexts
– <name><First>John</First> <Last>Doe</Last></Name>
– <Name>Simon Fraser University</Name>
• Every XML tag must have two parts: namespace
and local name
– General structure: namespace:local-name
– Namespace represented by URI (uniform resource
identifier)
• An abstract identifier (a general unique string)
• URL (uniform resource locator)
CMPT 354: Database I -- XML
21
Example – Namespace
• Namespaces are defined using the attribute xmlns
– All names xml* should be considered reserved
• Default namespace xmlns=“…”
– Only one default namespace
• Other namespace xmlns:toy=“…”
– Prefixes (e.g., toy) must be distinct
<item xmlns=“http://www.acmeinc.com/jp#supplies”
xmlns:toy=“http://www.acmeinc.com/jp#toys”>
<name>backpack</name>
<feature>
<toy:item>
<toy:name>cyberpet</toy:name>
</toy:item>
</feature>
</item>
CMPT 354: Database I -- XML
22
Namespace Declarations
• Namespace as prefix
– E.g., toy:item, toy:name
– Tags without prefix belong to the default
namespace
• Namespace declarations have scope
– Can be nested like a program block
CMPT 354: Database I -- XML
23
Example – Scopes of Namespaces
<item xmlns=“http://www.acmeinc.com/jp#supplies”
xmlns:toy=“http://www.acmeinc.com/jp#toys”>
<name>backpack</name>
<feature>
<toy:item>
<toy:name>cyberpet</toy:name>
</toy:item>
</feature>
<item xmlns=“http://www.acmeinc.com/jp#supplies2”
xmlns:toy=“http://www.acmeinc.com/jp#toys2”>
<name>notebook</name>
<toy:name>sticker</toy:name>
</item>
</item>
CMPT 354: Database I -- XML
24
More About Namespace
• The name of a namespace is just a string
that happens to be a URL
• Not necessarily it is a real address that
contains some kind of schema describing
the corresponding set of names
• Don’t be misled by the URL!
CMPT 354: Database I -- XML
25
Summary
• HTML and XML: differences and
applications
• Structure of XML
– Elements
– Attributes
– Well formed XML documents
• Namespace
CMPT 354: Database I -- XML
26
To-Do-List
• Can every relational table be represented in
XML? Can every XML document be
represented in a relational table?
• RSS is an application of XML. Try to
understand the two RSS segments at
http://www.xml.com/pub/a/2002/12/18/diveinto-xml.html
CMPT 354: Database I -- XML
27