* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download XML - Simon Fraser University
Open Database Connectivity wikipedia , lookup
Oracle Database wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Relational model wikipedia , lookup
ContactPoint wikipedia , lookup
XML and Web Data Data in HTML • HyperText Markup Language – Different data elements are set out using tags • No schema? – Based on the data itself, we can make a reasonable guess about the structure – “Self-describing” CMPT 354: Database I -- XML 2 Object and Schema CMPT 354: Database I -- XML 3 Semi-structured Data • Object-like: it can be represented as a collection of objects • Schemaless: it is not guaranteed to conform to any type structure • Self-describing – Often carries only the names of the attributes and has a lower degree of organization than the data in the database • Semi-structured data: data with the above characteristics CMPT 354: Database I -- XML 4 Schemaless But Self-Describing (#12345, [ListName:“Students”, Contents:{ [Name:“John Doe”, ID:“111111111”, Address:[Number:123, Street:“Main St”] ], [Name:“Joe Public”, Id:“666666666”, Address:[Number:666, Street:“Hollow Rd”] ]} ]) CMPT 354: Database I -- XML 5 XML • Extensible Markup Language – A standard adopted in 1998 by the W3C (World Wide Web Consortium) • Optional mechanisms for specifying document structure – DTD: the Document Type Definition Language, part of the XML standard – XML Schema: a more recent specification built on top of XML • Query languages for XML – XPath: lightweight – XSLIT: document transformation language – XQuery: a full-blown language CMPT 354: Database I -- XML 6 From HTML to XML CMPT 354: Database I -- XML 7 HTML and XML • HTML – A fixed number of tags – Each tag has its own well-defined meaning • E.g., <table> … </table> • XML: HTML-like language – An arbitrary number of user-defined tags – No a priori semantics – Mainly for data exchange – Display using stylesheet CMPT 354: Database I -- XML 8 Important Differences • XML contains a large assortment of tags chosen by the document author – The only valid tags in HTML are those sanctioned by the official specification of the language; other tags are ignored • Every opening tag must have a matching closing tag, and the tags must be properly nested – E.g., <a><b></a></b> is not allowed – Some HTML tags are not required to be closed, e.g., <p> • The document has a root element – the element that contains all other elements CMPT 354: Database I -- XML 9 Example Mandatory statement Root element XML elements Element names Element contents CMPT 354: Database I -- XML 10 Hierarchical Structure PersonList Student Title Contents Person Person Name: John Doe Name: Joe Public Id: 111111111 Id: 666666666 Address Address Number: 123 Number: 666 Street: Main St Street: Hollow Rd CMPT 354: Database I -- XML 11 Attributes • <PersonList Type=“Student”> – Type is the name of an attribute that belongs to the element PersonList – Student is the attribute value – All attribute values must be quoted – Text strings between tags do not need to be quoted • Empty element – <Title Value=“Student List”/> – The element has one attribute and no content – A shorthand for <Title Value=“Student List”></Title> CMPT 354: Database I -- XML 12 Processing Instructions & Comments • Processing instructions – <?xml version=“1.0” ?> – Contain anything the author might want to communicate to the XML processor, e.g., <?my-command go bring coffee?> – Rarely used • Comment – <!-- A comment --> – Can occur everywhere except inside the markups, i.e., between symbols < and > – An integral part of the document – May be used by a receiver (e.g., a browser) CMPT 354: Database I -- XML 13 CDATA Construct • Include strings of characters which contain markup elements that might make the document ill formed • <![CDATA[ This is an example of markup in HTML: <b><i> Example <\b><\i>]]> CMPT 354: Database I -- XML 14 XML Elements and Data Objects • XML allows mixed data/text structure • XML elements are ordered • XML has only one primitive type, string, and very weak facilities for specifying constraints A legal XML document <Address> Sally lives on <Street> Main St </Street> house number <Number> 123 </Number> in the beautiful Anytown, Canada. </Address> CMPT 354: Database I -- XML <Address> <Number> 123 </Number> <Street> Main St </Street> </Address> is different from <Address> <Street> Main St </Street> <Number> 123 </Number> </Address> 15 Use of Attributes • An element can have any number of user-defined attributes • What attributes can do can also be achieved with elements – An attribute may occur only once within a tag, while subelements with the same tag may be repeated • Attributes introduce ambiguity as to whether to represent information as attributes or elements – Sometimes convenient for representing data, can also be done with elements – The use of attributes is expected to decline <Address> <Number> 123 </Number> <Street> Main St </Street> </Address> CMPT 354: Database I -- XML <Address Number=“123” Street=“Main St/> 16 Attributes in Markup <Act Number=“5”> <Scene Number=“1” Place=“Mantua. A street”> … <Apothecary Voice=“scared”> Such mortal drugs I have; but Mantua’s law Is death to any he that utters them. </Apothecary> <Romeo Voice=“persistent”> Art thou so bare and full of wretchedness, And fear’st to die? … </Romeo> … </Scene> </Act> CMPT 354: Database I -- XML 17 Advantages of Attributes • Attributes in an element are not ordered – <Address Number=“123” Street=“Main St”/> – <Address Street=“Main St” Number=“123”/> • Attributes are more succinct • Attributes can be declared to have unique value and can be used to enforce limited kind of referential integrity <Address> <Number> 123 </Number> <Street> Main St </Street> </Address> CMPT 354: Database I -- XML 18 ID and IDREF – Cross-References CMPT 354: Database I -- XML 19 Well Formed XML Document • It has a root element • Every opening tag is followed by a matching closing tag, and the elements are properly nested inside each other • Any attribute can occur at most once in a given opening tag, its value must be provided, and this value must be quoted CMPT 354: Database I -- XML 20 Namespaces • A term (tag) might have different meanings in different contexts – <name><First>John</First> <Last>Doe</Last></Name> – <Name>Simon Fraser University</Name> • Every XML tag must have two parts: namespace and local name – General structure: namespace:local-name – Namespace represented by URI (uniform resource identifier) • An abstract identifier (a general unique string) • URL (uniform resource locator) CMPT 354: Database I -- XML 21 Example – Namespace • Namespaces are defined using the attribute xmlns – All names xml* should be considered reserved • Default namespace xmlns=“…” – Only one default namespace • Other namespace xmlns:toy=“…” – Prefixes (e.g., toy) must be distinct <item xmlns=“http://www.acmeinc.com/jp#supplies” xmlns:toy=“http://www.acmeinc.com/jp#toys”> <name>backpack</name> <feature> <toy:item> <toy:name>cyberpet</toy:name> </toy:item> </feature> </item> CMPT 354: Database I -- XML 22 Namespace Declarations • Namespace as prefix – E.g., toy:item, toy:name – Tags without prefix belong to the default namespace • Namespace declarations have scope – Can be nested like a program block CMPT 354: Database I -- XML 23 Example – Scopes of Namespaces <item xmlns=“http://www.acmeinc.com/jp#supplies” xmlns:toy=“http://www.acmeinc.com/jp#toys”> <name>backpack</name> <feature> <toy:item> <toy:name>cyberpet</toy:name> </toy:item> </feature> <item xmlns=“http://www.acmeinc.com/jp#supplies2” xmlns:toy=“http://www.acmeinc.com/jp#toys2”> <name>notebook</name> <toy:name>sticker</toy:name> </item> </item> CMPT 354: Database I -- XML 24 More About Namespace • The name of a namespace is just a string that happens to be a URL • Not necessarily it is a real address that contains some kind of schema describing the corresponding set of names • Don’t be misled by the URL! CMPT 354: Database I -- XML 25 Summary • HTML and XML: differences and applications • Structure of XML – Elements – Attributes – Well formed XML documents • Namespace CMPT 354: Database I -- XML 26 To-Do-List • Can every relational table be represented in XML? Can every XML document be represented in a relational table? • RSS is an application of XML. Try to understand the two RSS segments at http://www.xml.com/pub/a/2002/12/18/diveinto-xml.html CMPT 354: Database I -- XML 27