Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Functional Database Model wikipedia , lookup
Versant Object Database wikipedia , lookup
Relational model wikipedia , lookup
XML, distributed databases, and OLAP/warehousing The semantic web and a lot more What is XML? A framework for declarative languages A syntax and two major constructs: elements & attributes Elements: Have begin and end tags Can be embedded Can be put in lists (homogeneous or heterogeneous) Attributes: Are assigned to elements Are strings Are put in quotes What is XML for? Initially, as a cornerstone of the semantic web Automatic searching of the web (versus interactive) Self-describing data Has been adapted to a wide variety of application domains As a means for specifying the structure of data As a catch-all for nontraditional data XML documents An instance of XML is a language An instance of an XML language is a document Documents are hierarchical & list-oriented XML documents can be parsed in a single, linear pass There is do notion of a fixed schema Does not leverage meta data for set-oriented queries Order matters in a set of documents Order matters in a series of elements in a document Is it a generalized HTML? Sort of, but perhaps more of a meta alternative to HTML The real point is to allow HTML pages to be located and searched automatically This is done by allowing language developers to create their own names for documents, elements, & attributes What else is part of the XML philosophy? Namespaces Associated with URLs Can be referenced in a nested fashion in an XML document Widely distributed sharing of data, XML languages, and namespaces What’s missing, from the database uer’s and a programmer’s perspective? No innate notion of a query language No Objects Very limited data structuring capabilities Yet another impedance mismatch problem No way to store XML documents in a relational database, at least not natively No way to make a database out of a set of documents So, in response to the database community’s desires… A hierarchical query language – Xpath A specification format for schemas – DTDs But uses a different syntax Does not accommodate namespaces So, in response to the database community’s desires, phase 2… XML schema More atomic or “basic” types Like DTD’s, but with an XML syntax Supports namespaces Adds primary keys and foreign keys Adds more constructs for structuring data Simple types: primitive types, list and union, & restriction Attributes can be of simple types Complex types: compositors all (unordered) and sequence (ordered), and choice Extension and restriction Integrity constraints Query language 1: XPath Follows hierarchy of XML documents Uses syntax borrowed from Unix file system \ for root . for current node @ for value of an attribute [1], [2], etc., for siblings // for self or descendent of .//x for all descendants to find an element of a specific type x Augmented with URLs to create Xpointer Relational database systems generally have an XML data type now Distributed Databases & Distributed TXS – homogenous and heterogeneous See page 689: multiple DBs vs. a distributed DB Homogeneous distributed DBs Single unified schema Designed top down Distribution by row, column, table, by table selection Issues of distribution Redundancy: availability vs. keeping copies up to date Hidden joins with column distribution Hidden unions with table selection distribution Executing distributed transactions Each node has a master and a client module 3 basic strategies for query fragment execution Masters are all identical and contain distributed data info Clients are like single site databases with a prepare to commit Bring data to procedure Send procedure to data Meet in a 3rd place Estimating costs Data shipping Result shipping Wait times on nodes Integrity constraint enforcement Heterogeneous distributed databases Forms of heterogeneity Model Schema Database product Namespace Table structure (implications for object identities) Keys and Foreign keys Units SQL dialect Semantic issues relating to varying interpretations of data Integrating heterogeneous databases After the fact Stability is never achieved Mappings are complex Data may have conflicts, redundancy, and gaps Closed world vs. open world Engineering for nonstop change Mediators around databases Gateways connecting old apps and new databases Gateways connecting new apps and old databases A stability of instability OLAP Standard model N dimension tables 1 fact table (PK is union of keys of dimension tables) Hypercube visualization Multidimensional table result visualizations Star and constellation schemas Terminology Drilling down – stepping down nested attributes Rolling up – moving up nested attributes Pivot – group by Specialized operators Cube operator and 4 equivalent queries Viewing results See page 722 Equivalent – see 723 Populating the warehouse Transformation Integration cleaning Data mining Effectively an open world application Association, classification, clustering – page 730 Association – confidence and support – page 731