Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
ContactPoint wikipedia , lookup
Relational model wikipedia , lookup
Semistructural databases Database lectures for mathematics students Zbigniew Jurkiewicz, Institute of Informatics UW May 29, 2016 Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Semistructural data Due to the dramatic growth of WWW and Internet it is now easy to place information in the net to make it publicly available. It is natural to try to use this information as a database. However the data stored in the form of HTML or XML files has irregular structure. So people started to call such data sources as semistructural data. Disadvantage: chaotic query languages, similar to procedural query languages like Cobol 30 years ago. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Semistructural data Data model based on trees Flexible representation of data: directed graph Schema include in data, “self-describing” data Useful for information integration (e.g. virtual data warehouses) Good model for storing XML Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Example Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Semistructural graph Nodes = objects Arc labels = object attributes Atomic values in tree leaves Flexibility: no restrictions on labels of outgoing arcs number of descendants Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Queries Query langugages based on concept of path expressions (e.g. Lorel). Path expression = regular expression describing a path from root. Example paths biblio.book|article.author biblio._*.author Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Example queries All book authors: Query z1 select author: x from biblio.book.author x; All items having Jeffrey Ullman as author: Query z2 select item: x from biblio._ x where "Jeffrey Ullman" in X.author; Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Example queries Authors of items which have “database” occuring in title Query z3 select author: y from biblio._ x, x.author y, x.title z where ".*(D|d)atabase.*" ˜ z; Authors and titles of all books. Query z4 select item: title: y, author: z from biblio.book x, x.title y, x.author z; Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML Rapid expansion of WWW, which was based on pages written in HTML, resulted in rediscovery of advantages of parenthesized representation of structured data. Language XML (eXtensible Markup Language) is a step in the direction of standarization of the representation of data stored in textual files and sent out by network. Because XML is a simplified version of SGML, main XML object is traditionally called document, while it often contains no ordinary text, but structured data. Most CASE tools include the possibility of importing and exporting files in XML, some of them even use it as native storage representation. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML eXtensible Markup Language Documents marked with tags Extensible user-defined semantic tags, e.g. <student> HTML contains only the fixed “presentational” (i.e. useful for formatting and display) set of tags, e.g. <blockquote> WWW Consortium page http://www.w3.org/XML/ Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML Generally, XML can be used to represent any data having structure. XML expression is a fully parenthesized form of data representation. Parentheses in XML have labels (in other words they correspond to phrase markers from formal grammars), e.g. instead of writeing a list on numbers 3, 5 and 4 in the simple form (3 5 4) in XML we will write <list>3 5 4</list> Constructs <list> and </list> are called starting tag and ending tag, however you can look on them simply as opening and closing parentheses. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML Using XML you can put tags around nearly arbitrary set of characters. A pair of matching tags with the text between them is called XML element. Character sequence contained inside tag is element’s name, and the text between tags is element’s contents, morever the XML tag may contain attributes, e.g. the following element <list title="grades" date="2004-10-22"> 3 5 4 </list> has two attributes: title and date. Their values are strings "grades" oraz "2004-10-22". Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML The following examples show two methods of representing grades register: As a parenthesized list. This could be an internal representation used in a program written in Lisp, Scheme, Dylan or Ruby. As XML expression. Note that in this form the information about the subject and semester is given as attributes of the element <exam>, which helps with information exchange. Similarly index of a student is given as an attribute of the element <results>. Additionally the grade for each exercise was put in a separate element. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m List ("Databases" "Spring/2009" ("201" 78 88 69) ("202" 88 87 86) ("203" 99 88 88) ("204" 77 78 77) ("205" 90 89 81) ("206" 67 78 81)) Representing exam protocol as a list Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML <exam subject="Databases" semester="Spring 2009"> <grades index="201"> <pts>78</pts> <pts>88</pts> <pts>69</pts> </grades> <grades index="202"> <pts>88</pts> <pts>87</pts> <pts>86</pts> </grades> <grades index="203"> <pts>99</pts> <pts>88</pts> <pts>88</pts> </grades> <grades index="204"> <pts>77</pts> <pts>78</pts> <pts>77</pts> </grades> <grades index="205"> <pts>90</pts> <pts>89</pts> <pts>81</pts> </grades> <grades index="206"> <pts>67</pts> <pts>78</pts> <pts>81</pts> </grades> </exam> Representing exam protocol in XML Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML We see clearly that XML generalizes parenthesized list notation. Parentheses are named (labeled), and each parenthesized element may have additional attributes. XML document schema (i.e. hierachical document structure and tags used for elements and attributes) should be previously defined. This is done with a separate document. Initially DTD (Document Type Description) was used, more modern solution is to use XML Schema. Document schema description could be perceived as a definition of a vocabulary used (the so called “semantic web” is a more global approach). Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Element Element = any document fragment contained between complementary pair of tags, for example <actor> ... </actor> Simple elements, e.g. <br /> in HTML, are exception to this rule. They do not have contents and do not occur in pairs. Note: some HTML browsers insist on space before slash. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Auxiliary tags Comments <!- ... -> Processing instructions <?name ... ?> for example XML document should start with the instruction <?xml version="1.0" ?> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Document XML Document = single element May be preceded by optional prolog containing XML declaration <?xml version="1.0" ?> document type definition (DTD), usually by a reference to a separate file <!DOCTYPE name of the main element SYSTEM "file.dtd"> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML vs HTML Small and capital letter in tags are different. Attribute values should always be in quotes. No implicit termination for some tags (e.g. </p>). Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML correctness levels Well-formed: syntactic correctness, paired tags — each opening tag (e.g. <student> should have corresponding closing tag (e.g. </student>), does not need DTD; <?xml version="1.0" standalone="yes" ?> <body> ... </body> Valid: described with some DTD (Document Type Definition (and consistent with it ;-) <?xml version="1.0" standalone="no" ?> <!DOCTYPE Student SYSTEM "student.dtd"> <Student> ... </Student> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m DTD — tags <!DOCTYPE main-tag [ element ... ]> <!ELEMENT tag (component,...)> Example <!DOCTYPE Students [ <!ELEMENT Students (Student*)> <!ELEMENT Student (firstname,lastname,address,year)> <!ELEMENT firstname (#PCDATA)> <!ELEMENT lastname (#PCDATA)> ... ]> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m DTD – tags ? * | #PCDATA CDATA #REQUIRED optional element closure (occurs 0 or more times) alternative any text without tags any text required attribute Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m DTD — example use <?xml version="1.0" standalone="no" ?> <!DOCTYPE Studenci SYSTEM "student.dtd"> <Students> <Student> <firstname>Onufry</firstname> <lastname>ZagÅĆoba</lastname> <address>Dzikie Pola</address> <year>1648</year> </Student> <Student> ... </Student> ... </Students> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Example DTD <!DOCTYPE Exchange [ <!ELEMENT exchange (title?, rate*)> <!ELEMENT rate (#PCDATA)> <!ATTLIST rate currency CDATA #REQUIRED type (sale|purchase|average) "average"> ... ]> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m DTD usage <?xml version="1.0" ?> <exchange> <title>Exchange rates</title> <rate currency="USD">4,235</rate> ... </exchange> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m DTD — attributes Placed in the opening tag. Form: attribute="value" Also used as links for connecting elements Declared by <!ATTLIST element atribute type ...> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m DTD – example with attributes <!DOCTYPE Students [ <!ELEMENT Students (Student*)> <!ELEMENT Student (firstname,lastname,address,year)> <!ATTLIST Student studentID ID attends IDREFS> <!ELEMENT lastname (#PCDATA)> ... ]> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Another example <?xml version="1.0" standalone="no" ?> <!DOCTYPE Students SYSTEM "student.dtd"> <Students> <Student studentID="OZ" attends="ms,gpp"> <firstname>Onufry</firstname> <lastname>ZagÅĆoba</lastname> <address>Dzikie Pola</address> <year>1648</year> </Student> <Student> ... </Student> ... </Students> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML Schema Currently often instead of DTD the newer XML Schema notation is being used, as it is more expressive. Example description of student element <xsd:element name="student"> <xsd:sequence> <xsd:element name="first-name" type="xsd:string"/> <xsd:element name="last-name" type="xsd:string"/> <xsd:element name="addresss" type="xsd:string"/> <xsd:element name="year" type="xsd:int"/> </xsd:sequence> <xsd:/element> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Linking attributes Type ID marks the identifying attribute – for use in other elements. Type IDREF is a reference to the value of ID attribute in other element. Missing “type discipline”! But this is the simplest mechanism, there are more advanced: XLink and XPointer. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Connections between documents XML sublanguage XLL (eXtensible Link Language), also known as XLink. More description possibilities than HTML references. One can create a reference to many documents (with selection by user) and to groups of documents (similar to frame with list). Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Displaying A document written in XML can be transformed into other form or placed on a WWW server. For display in browser we must however describe the method of display. For simple applications it is enough to use Cascading Style Sheets (CSS). CSS sheet to be used is declared in the document header with <link rel="stylesheet" type="text/css" href="name.css"> Style for elements may be also directly specified using attribute style <li style="color: red"> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Displaying SGML uses DSSSL for transformations. In HTML cascading style sheets (CSS) serve to change standard method of tag display. In XML we may also specify transformations used for tags with XSL (eXtensible Stylesheet Language), e.g. Display tag <exchange> as HTML table (<table>). Display tag <rate> as table row (<tr>). Using appropriate XML parser we can make conversion into any other form (e.g. TEX). Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML for information exchange The important and typical application of XML is to exchange information between different CASE tools. The structure of application modeled in UML in such tools as Rational Rose can be exported as XML document and read into other tool, e.g. generator for special applications, or simply put into WWW server. General mapping rules: objects ↔ XML documents classes ↔ XML schemas Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XMI OMG (Object Management Group) www.omg.org proposed a standard format of information exchange called XMI (XML Metadata Interchange) http://cgi.omg.org/cgi-bin/doc?ad/01-06-12. Many interesting informations can be found also at http://XMLmodeling.com. Umbrello uses XMI. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Programming XPath is a language (or more precisely notation for patterns) to select a set of nodes within a hierarchical document. XQuery is an XPath-based query language for querying XML documents. A query is a search statement to retrieve specific portions of a document that conform to a specified search criterion. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XPath Notation which lets to describe a set of elements (nodes). In the simplest case an expression is a sequece of element names, for example /Students/Student/name describes the contents of all elements <name>. To fetch the name of the first student we can use index /Students/Student[1]/name We can use functions, e.g. count(/Students/Student) returns the number of students. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m Namespaces The term Namespace) corresponds to modules or packages from ordinary programming languages. Namespaces are usually named with URL addresses, e.g. http://www.w3.org/2001/XMLSchema Remark: the use of URL address does not imply that there needs to be a document under this address. The XML parser does not try to look at it. Usually however the document contains some description of the namespace. As URLs are long, usually the aliases are declared, e.g. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m SQL/XML New part in SQL:2003 A new built-in type, XML. 4 built-in operators: XMLPARSE: returns a value of XML type given an SQL character string expression XMLSERIALIZE: returns a value of character string type given an XML expression XMLROOT: modifies the root information item of an XML value and returns the modified value. XMLCONCAT: concats two or more XML values and returns the resulting value. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m SQL/XML A predicate, IS DOCUMENT, to test whether an XML value has a single root element. 5 “publishing functions” that generate values of XML type from SQL expressions: XMLELEMENT XMLFOREST XMLATTRIBUTE XMLNAMESPACES XMLAGG Host language bindings for values of XML type. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML type A new SQL built-in type Can be used wherever a SQL data type is allowed — as the type of a column of a table, parameter of a routine, attribute of an UDT, or a SQL variable. Strongly-typed — values of XML type are distinct from their textual representation. Semantics of operations on values of XML type is specified by assuming a tree-based internal representation based on the XML Information Set Recommendation (Infoset). The Infoset model is modified in one significant way: the document information item of Infoset is replaced by a new kind of information item, XML root information item. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m XML in Postgres In PostgreSQL there is XML-related functionality based on the SQL/XML standard. In version 8.3 it covered XML syntax checking and XPath queries. XML columns are declared using xml data type, e.g. CREATE TABLE test (a xml, b xml); Specialized functions has been added to query XML values, e.g. SELECT xmlelement(name test, xmlattributes(a, b)) FROM test; Starting from PostgreSQL 9.2, a new native data type for JSON values has been added. This is another semistructural representation, originated in Javascript programming language. Zbigniew Jurkiewicz, Institute of Informatics UW Semistructural databases Database lectures for m