Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Functional Database Model wikipedia , lookup
Concurrency control wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
ContactPoint wikipedia , lookup
XML Salman Azhar Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs These slides use some figures, definitions, and explanations from ElmasriNavathe’s Fundamentals of Database Systems and Molina-Ullman-Widom’s Database Systems 2/6/05 Salman Azhar: Database Systems 1 Framework 1. Information Integration : 2. Semi-structured Data : 3. Making databases from various places work as one. A new data model designed to cope with problems of information integration. XML : A standard language for describing semistructured data schemas and representing data. 2/6/05 Salman Azhar: Database Systems 2 1. Information Integration Generally databases in an enterprises have: Several underlying database management systems Oracle, Informix, MS SQL Server, Sybase (SQL Server), DB2, MS Access, etc. Several underlying database schemas Information in an employee table can contain 2/6/05 Employee Name, SSN, DOB, title, hrsPerWeek. modifiedTime, modifiedBy Employee Name, SSN, DOB, title, degree, createTime, createBy Employee Name, SSN, DOB, title, salary, modifiedTime, modifiedBy, createTime, createBy Salman Azhar: Database Systems 3 2. Semi-structured Data A new data model designed to cope with problems of information integration Accommodates of different DBMS Oracle, Informix, MS SQL Server, Sybase (SQL Server), DB2, MS Access, etc. Integrates different schemas 2/6/05 Employee Name, SSN, DOB, title, hrsPerWeek, modifiedTime, modifiedBy Employee Name, SSN, DOB, title, degree, createTime, createBy Employee Name, SSN, DOB, title, salary, createTime, createBy, modifiedTime, modifiedBy Salman Azhar: Database Systems 4 3. XML A standard language for describing semi-structured data schemas and representing data. 2/6/05 Salman Azhar: Database Systems 5 The Information-Integration Problem Major bottleneck in enterprise application integration For example… Hewlett Packard split into HP and Agilent HP bought Compaq 2/6/05 Need to separate data into different destinations Need to integrate data from different sources Salman Azhar: Database Systems 6 The Information-Integration Problem Related data exists in many places and could, in principle, work together. But different databases differ in: Model 1. relational, object-oriented? Schema 2. normalized/denormalized? Terminology 3. are consultants employees? Retirees? Subcontractors? Conventions 4. 2/6/05 meters versus feet? Salman Azhar: Database Systems 7 Example Consider merger of two stores in a Mall may be some overlap in the products sold but the databases are different 2/6/05 Salman Azhar: Database Systems 8 Example Each company has a database One may use a relational DBMS One stores the phones of distributors, the other does not One distinguishes products in one department the other keeps the data in an MS-Word document the other doesn’t One counts inventory by number of items, 2/6/05 the other by cases Salman Azhar: Database Systems 9 Two Approaches to Integration 1. Warehousing Makes a copy of the data 2. More developed of the two Mediation Creates a view of the data 2/6/05 Newer and less developed Salman Azhar: Database Systems 10 Warehouse Diagram User query Result Warehouse 2/6/05 Wrapper Wrapper Source 1 Source 2 Salman Azhar: Database Systems 11 A Mediator Result User query Mediator Query Result Result Wrapper Query 2/6/05 Wrapper Result Source 1 Query Query Result Source 2 Salman Azhar: Database Systems 12 Warehousing Make copies of the data sources at a central site and transform it to a common schema Reconstruct data daily/weekly Do not try to keep it more up-to-date than that. Pro: very well-developed several commercial tools are available Con: 2/6/05 data can be old since updates are expensive 24-hour availability threatened by large data updates Salman Azhar: Database Systems 13 Mediation Create a view of all sources, as if they were integrated Answers a view query by translating it to terminology of the sources and querying them Pro: Current data Con: 2/6/05 Can be slow as it requires real time merger of different data sources Lack of tools available Salman Azhar: Database Systems 14 Warehouse Diagram User query Result Warehouse 2/6/05 Wrapper Wrapper Source 1 Source 2 Salman Azhar: Database Systems 15 A Mediator Result User query Mediator Query Result Result Wrapper Query 2/6/05 Wrapper Result Source 1 Query Query Result Source 2 Salman Azhar: Database Systems 16 Semi-structured: Motivation Most effective approach to Information Integration: Semi-structured Data Model or Semi-structured Objects 2/6/05 Salman Azhar: Database Systems 17 Semi-structured: Motivation Main limitation of Object-Oriented Models: Object Models are Strongly Typed Objects of a class have one structure only Semi-structured approach solves this problem 2/6/05 Salman Azhar: Database Systems 18 Semi-structured Data Purpose: Represent data from independent sources more flexibly than 2/6/05 either relational or object-oriented models Salman Azhar: Database Systems 19 Semi-structured Data Each object has a class of their own and properties are defined whatever labels are attached to that object Properties mean 2/6/05 attributes, relationships, methods, etc. Salman Azhar: Database Systems 20 Semi-structured Data Think of objects but with the type of each object is the objects its own business not that of its “class” Labels to indicate meaning of substructures 2/6/05 Salman Azhar: Database Systems 21 Semi-structured Graphs Easy to think of Semi-structured data as Graphs Nodes = objects Labels on arcs = 2/6/05 attributes leading to a leaf node relationships leading to another node Salman Azhar: Database Systems 22 Semi-structured Graphs Atomic values at leaf nodes nodes with no arcs out Flexibility: no restriction on… labels out of a node number of successors with a given label 2/6/05 Salman Azhar: Database Systems 23 Example: Data Graph Root object represents the entire DB. Often look like trees, but are not. root The restaurant object for KFC (arc-in called rest; arc-out labeled name to KFC) soda rest soda manf name sellsAt manf PepsiCo prize name year Pepsi Sobe name addr KFC Main St Notice a new kind of data. 2003 award BestSeller The soda object for Pepsi (arc-in called soda; arc-out called name to Pepsi) 2/6/05 Salman Azhar: Database Systems 24 Stage is Now Set for XML A technology has application to different situations 2/6/05 foundations remain the same applications changes Salman Azhar: Database Systems 25 Extensible Markup Language (XML) XML HTML uses tags for semantics (e.g., “this is an address”) uses tags for formatting (e.g., “italic”), Key idea: 2/6/05 create tag sets for a domain (e.g., genomics) translate all data into properly tagged XML docs Salman Azhar: Database Systems 26 Well-Formed and Valid XML Well-Formed XML allows you to invent your own tags similar to labels in semi-structured data graph Valid XML involves a DTD (Document Type Definition) DTD gives 2/6/05 a grammar for the use of labels limits the set of labels our of node the order and number of times a label occurs Salman Azhar: Database Systems 27 Well-Formed XML All XML documents have Header defines Header Body version specifies that the document is in well-formed XML Body can include 2/6/05 root tag several properly matching tags Salman Azhar: Database Systems 28 Well-Formed XML: Header Start the document with a declaration surrounded by <? … ?> . Normal declaration for Well-Formed XML is: <? XML VERSION = “1.0” STANDALONE = “yes” ?> Version indicates version number Standalone = “yes” means no DTD 2/6/05 no DTD means well-formed XML Salman Azhar: Database Systems 29 Well-Formed XML: Body Body of document is a root tag surrounding nested tags. Body can include: several properly matching tags special tag called root tag 2/6/05 (as in html structure) can have a special meaning such as document type or can be generic Salman Azhar: Database Systems 30 Tags Tags, as in HTML are normally matched pairs, as may be nested arbitrarily some tags requiring no matching ending 2/6/05 <BLAH> … </BLAH> such as <P> in HTML, are also permitted however, we will not use these in examples Salman Azhar: Database Systems 31 Example: Well-Formed XML <? XML VERSION = “1.0” STANDALONE = “yes” ?> <RESTS> <REST> <NAME>Taco Bell</NAME> One of several nested <SODA><NAME>Pepsi</NAME> REST tags representing <PRICE>1.00</PRICE></ SODA> information about a <SODA><NAME>Sobe</NAME> single REST <PRICE>2.00</PRICE></SODA> <NAME> tag specifies </REST > the REST name <REST> … <SODA> tags Literal Data items have names </REST > are contained at and price for the atomic level … each Soda </RESTS> Root tag RESTS surrounds the entire document nested in <NAME> and <PRICE> tags 2/6/05 Salman Azhar: Database Systems 32 XML and Semi-structured Data Consider this… Is Well-Formed XML documents with nested tags is exactly the same idea as trees of semi-structured data? Tags Nodes represent data between matching tags Parent-child relationship 2/6/05 are the labels on edges is immediate nesting in XML Salman Azhar: Database Systems 33 XML and Semi-structured Data Semi-structured approach allows for non-tree structures We shall see that XML also enables nontree structures 2/6/05 mimics the semi-structured data model Salman Azhar: Database Systems 34 Group Exercise Convert the following into a Semistructured representation <? XML VERSION = “1.0” STANDALONE = “yes” ?> <RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … Note: Do not turn over to the </REST > next page before attempting … </RESTS> this exercise yourself! 2/6/05 Salman Azhar: Database Systems 35 Solution: The semi-structured representation <? XML VERSION = “1.0” STANDALONE = “yes” ?> <RESTS> <REST> <NAME>Taco Bell</NAME> RESTS <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> REST </REST > <REST> … </REST > NAME … SODA </RESTS> REST REST SODA Taco Bell NAME Pepsi 2/6/05 PRICE 1.00 NAME Sobe PRICE ... Note: Data is stored in leaf nodes and structure (tags) in internal nodes 2.00 Salman Azhar: Database Systems 36 Valid XML Switching gears: Well-formed to Valid XML Valid XML is the most interesting use of XML Essentially a context-free grammar for describing XML tags and their nesting Specified by DTD Each domain of interest creates one DTD that describes all the documents this group will share 2/6/05 For example, electronic components, travel industry, etc., will have their own DTDs Salman Azhar: Database Systems 37 DTD Structure Note: !DOCTYPE is key word with <root tag> being the name of DOCTYPE <!DOCTYPE <root tag> [ <!ELEMENT <name> ( <components> ) <more elements> ]> Between [ … ] list of ELEMENT definition Each !ELEMENT has a <name> with the allowed list of <components> usually in the order listed 2/6/05 Salman Azhar: Database Systems 38 DTD Elements Element definition consists of its name (tag) and a parenthesized description of any nested tags includes order of subtags and their multiplicity (0, 1, or many times) Leaves (text elements) 2/6/05 have #PCDATA in place of nested tags Salman Azhar: Database Systems 39 Example: DTD <!DOCTYPE RESTS [ RESTS can have * (0 or more) REST <!ELEMENT RESTS (REST*)> REST has NAME and <!ELEMENT REST (RNAME, SODA+)>then + (1 or more) SODA… Order matters! <!ELEMENT NAME (#PCDATA)> SODA has NAME followed PRICE SODA’s NAME and PRICE are data (#PCDATA) NAME and PRICE are data (#PCDATA): No more tags just text GROUP EXERCISE: COMPLETE THE DTD ]> Note: Do not turn over to the next page before attempting this exercise yourself! 2/6/05 Salman Azhar: Database Systems 40 Example: DTD <!DOCTYPE RESTS [ RESTS can have * (0 or more) REST <!ELEMENT RESTS (REST*)> REST has NAME and <!ELEMENT REST (RNAME, SODA+)>then + (1 or more) SODA… Order matters! <!ELEMENT NAME (#PCDATA)> <!ELEMENT SODA (NAME, PRICE)> NAME and PRICE are data (#PCDATA): No more tags just text <!ELEMENT NAME (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> SODA has NAME followed PRICE ]> 2/6/05 Salman Azhar: Database Systems 41 Element Descriptions Rules Subtags must appear in order shown A tag may be followed by a symbol to indicate its multiplicity: Identical to UNIX regular expressions. * = zero or more. + = one or more. ? = zero or one. Alternative sequences of tags can be connected by the symbol | 2/6/05 Salman Azhar: Database Systems 42 Example: Element Description A name is Either an optional title (e.g., “Dr.”), a first name, and a last name, in that order, or it is an IP address <!ELEMENT NAME ( (TITLE?, FIRST, LAST) | IPADDR Alternative symbol )> 2/6/05 Salman Azhar: Database Systems 43 Use of DTDs In order to specify a document follows a particular DTD 1. Set STANDALONE = “no” a) b) 2/6/05 Either include the DTD as a preamble of the XML document Follow DOCTYPE and the <root tag> by SYSTEM and a path to the file where the DTD is stored Salman Azhar: Database Systems 44 Example (a) <? XML VERSION = “1.0” STANDALONE = “no” ?> <!DOCTYPE RESTS [ DTD <!ELEMENT RESTS (REST*)> <!ELEMENT REST (NAME, SODA+)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT SODA (NAME, PRICE)> <!ELEMENT NAME (#PCDATA)> <!ELEMENT PRICE (#PCDATA)> ]> Document <RESTS> Same as earlier but this time it conforms to the above DTD <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > … </RESTS> 2/6/05 Salman Azhar: Database Systems 45 Example (b) Assume the RESTS DTD is in file rest.dtd <? XML VERSION = “1.0” STANDALONE = “no” ?> Get the DTD <!DOCTYPE Rests SYSTEM “rest.dtd”> <RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > … </RESTS> 2/6/05 Salman Azhar: Database Systems from the file rest.dtd Document Same as earlier but this time it conforms to the DTD in rest.dtd 46 Attributes Attributes are another important component of DTD and XML docs Opening tags in XML can have attributes like <A HREF = “…”> in HTML In DTD <!ATTLIST <elementname>… > 2/6/05 gives a list of attributes and their data types for this element Salman Azhar: Database Systems 47 Example: Attributes Rests can have an attribute kind which is either qsr, family, or other. The element definition is unchanged However, we add an ATTLIST. <!ELEMENT REST (NAME SODA*)> <!ATTLIST REST kind “qsr” | “family” | “other”> 2/6/05 Salman Azhar: Database Systems 48 Example: Attribute Use In a document that allows REST tags, we might see: <REST kind = “qsr”> New info: kind = “qsr” <NAME>KFC</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></SODA> ... </REST> 2/6/05 Salman Azhar: Database Systems 49 IDs and IDREFs Introduce links from one object to another Allows the structure of an XML document to be a general graph rather than just a tree. These are pointers from one object to another 2/6/05 in analogy to HTML’s NAME = “blah” and HREF = “#blah” Salman Azhar: Database Systems 50 Creating IDs We give an element Elephant an attribute Attention of type ID in the DTD When using tag <Elephant> in an XML document, give its attribute Attention a unique value. For example, 2/6/05 <Elephant Attention = “213”> Salman Azhar: Database Systems 51 Creating IDREFs IDREFs are similar to IDs: To allow objects of type Fig to refer to another object with an ID attribute, Or, let the attribute have type IDREFS, 2/6/05 give Fig an attribute of type IDREF (single string of type ID) so the Fig –object can refer to any number of other objects (any number strings of type ID). Salman Azhar: Database Systems 52 Example: IDs and IDREFs Let us redesign our RESTS DTD to include both REST and SODA sub-elements Both rests and sodas will have ID attributes called name Rests have PRICE sub-objects, Sodas have attribute soldBy, 2/6/05 consisting of a number (the price of one soda) and an IDREF theSoda leading to that soda which is an IDREFS leading to all the rests that sell it Salman Azhar: Database Systems 53 The DTD RESTS have 0+ REST and 0+ SODA <!DOCTYPE Rests [ <!ELEMENT RESTS (REST*, SODA*)> REST objects have name as an <!ELEMENT REST (PRICE+)> ID attribute and have one or more PRICE sub-objects <!ATTLIST REST name ID> PRICE objects <!ELEMENT PRICE (#PCDATA)> have a <!ATTLIST PRICE theSoda IDREF> number (the price) and <!ELEMENT SODA ()> one reference to a soda <!ATTLIST SODA name ID, soldBy IDREFS> ]> Soda objects have an ID attribute called name, and a soldBy attribute that is a set of Rest names 2/6/05 Salman Azhar: Database Systems 54 Example Document <RESTS> <REST name = “Taco Bell”> <PRICE theSoda = “Pepsi”>1.00</PRICE> <PRICE theSoda = “Sobe”>2.00</PRICE> </REST> … <SODA name = “Pepsi”, soldBy = “KFC, TacoBell,…”> </SODA> … </RESTS> 2/6/05 <!DOCTYPE Rests [ <!ELEMENT RESTS (REST*, SODA*)> <!ELEMENT REST (PRICE+)> <!ATTLIST REST name ID> <!ELEMENT PRICE (#PCDATA)> <!ATTLIST PRICE theSoda IDREF> <!ELEMENT SODA ()> <!ATTLIST SODA name ID, soldBy IDREFS> ]> Salman Azhar: Database Systems 55 Recap Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs 2/6/05 Salman Azhar: Database Systems 56 Perspective Here XML is used as a EDI medium EDI = electronic data interchange There are many other using for XML 2/6/05 Each has its own utilization Salman Azhar: Database Systems 57 Questions? Questions??? 2/6/05 Doesn’t mean you will get all the answers! Salman Azhar: Database Systems 58