* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 10. Deductive databases
Survey
Document related concepts
Transcript
8. Special database types Distributed databases • Distribution of data: – Several host sites. – Availability and reliability: replicated data – Distributed concurrency control • Distribution of users: – Client-server architecture – Web databases; three-tier architecture AdvDB-8 J. Teuhola 2015 223 Distributed databases: Requirements • • • • Replication and partitioning of data Maintenance of a location map for data Query optimization for multiple hosts Maintenance of consistency among replicas after update operations • Recovery from network failures • Partial usability when some hosts are down • Management and control of access rights AdvDB-8 J. Teuhola 2015 224 Distributed databases: Advantages • Improved efficiency by replication: data close to users, preferably in the local host. • Improved reliability by replication: When one host is down, others continue to operate. Data is accessible when one copy is available. • Transparency: The user does not need to know the location of data / replicas / partitions. • Extensibility: new nodes can be added to the network. AdvDB-8 J. Teuhola 2015 225 Example: distributed join • Relation R(X, Y, Z) stored in host A • Relation S(Z, W) stored in host B • Steps of natural join R * S for host A: – – – – Send column R(Z) from A to B Compute semijoin T(Z, W) = R(Z) * S(Z, W) in B Send relation T back to A Compute the final join R * T • Note: the last step can be replaced by concatenation if duplicates are maintained in W and T AdvDB-8 J. Teuhola 2015 226 Deductive (logic) databases Main features: • ‘Data’ consists of facts and rules. • Declarative language to define them • Inference engine = deduction mechanism for solving queries Related areas: • Relational data model (esp. relational calculus) • Logic programming (Prolog) • Datalog: Subset of Prolog AdvDB-8 J. Teuhola 2015 227 Deductive databases: Example in Datalog Facts: parent(x, y) means that y is x’s parent parent(peter,mary). parent(peter,paul). parent(mary,john). parent(paul,joan). Rules: ancestor(x, y) means that y is x’s ancestor ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y). Queries: (1) ancestors of Peter, (2) descendants of Joan ?- ancestor(peter,?). ?- ancestor(?,joan). AdvDB-8 J. Teuhola 2015 228 Data warehouses • Support for decision making. • Derived, integrated and refined from operational databases. • No transaction processing, not quite up-to-date. • Multidimensional view of data (data cube) • OLAP = On-Line Analytic processing. • Summary and multidimensional data. • Statistical analysis tools. • Data mining tools. AdvDB-8 J. Teuhola 2015 229 Example: data cube on sales • Sales values per salesman, product and date Salesperson Date Product AdvDB-8 J. Teuhola 2015 230 Example: ‘Star’ schema for data warehouse ProdTable Prod-no Name Descr Group SalesTable ProdNo AreaNo Date Amount Value ‘Fact table’: Sales AreaTable AreaNo Name Seller TimeTable Date DayOfWeek ‘Dimension tables’: Prod, Area Time AdvDB-8 J. Teuhola 2015 231 XML databases: ‘semi-structured data’ • Storage and retrieval of XML documents: structured using nested pairs of tags • Flexible, hierarchical schema • Alternative implementations for XML databases: – Relational database: various alternatives – Object database: more direct mapping of the structure – Native XML database: built from scratch, tailored especially for this data type • Query Language: XQuery AdvDB-8 J. Teuhola 2015 232 Example document collection: 2 courses <?xml version=“1.0”?> <course> <cname>Adv DB</cname> <teacher>Timo</teacher> <audience> <student>Pasi</student> <student>Pirjo</student> </audience> </course> AdvDB-8 <?xml version=“1.0”?> <course> <cname>C++</cname> <teacher>Esa</teacher> <audience> <student>Pasi</student> <student>Pia</student> </audience> </course> J. Teuhola 2015 233 Illustration as tree structures • Course document 1 • Course document 2 course cname teacher Adv DB Timo course audience student student Pasi Pirjo AdvDB-8 cname teacher C++ Esa J. Teuhola 2015 audience student student Pasi Pia 234 Relational alternative 1: XML data type for a column Courses-relation cid course document c1 <?xml…?><course><cname>AdvDB</cname><teacher> Timo</teacher><audience><student>Pasi</student> <student>Pirjo</student></audience></course> c2 <?xml version=“1.0”?><course><cname>C++</cname> <teacher>Esa</teacher><audience> <student>Pasi </student><student>Pia</student></audience></course> AdvDB-8 J. Teuhola 2015 235 Relational alternative 2: Non-typed nodes Nodes-relation node-id n1 n2 n3 n4 n5 n6 n7 n8 … element course cname teacher audience student student course cname … AdvDB-8 parent text-value n1 n1 n1 n4 n4 n7 … Adv DB Timo Pasi Pirjo C++ … J. Teuhola 2015 236 Relational alternative 3: Typed nodes Courses cid cname c1 Adv DB c2 C++ teacher Timo Esa Audience student cid Pasi c1 Pirjo c1 Pasi c2 Pia c2 AdvDB-8 J. Teuhola 2015 237 Digital libraries • Organized collection of information ( web) • Close to multimedia databases, but more focused on information retrieval features • Two types of users: – End users make retrievals – Librarians select, organize and maintain the collection. • Important: Metadata and annotations • Hard job: digitalization of ’real’ libraries AdvDB-8 J. Teuhola 2015 238 Spatial databases • Representations: Solid (2D, 3D), boundary, abstract (‘above’, ‘near’, ‘under’, ...) • Objects: points, line segments, rectangles • Spatial operations (intersection, nearest neighbor, spatial join, ...) • Important application area: GIS = Geographic Information system (objects on maps). • Temporal dimension may be included (movement, order of events) AdvDB-8 J. Teuhola 2015 239 Scientific databases • Large amounts of observed data (raw, calibrated, validated, derived, interpreted) • Updated seldom - transaction processing not needed. • One form of data warehouse. • Metadata is crucial • Example of scientific database: genome and protein data in bioinformatics (sequences, 3D-structures) AdvDB-8 J. Teuhola 2015 240 Multimedia databases • Text, hypertext, images, graphics, audio, video • Applications: Media servers, audio/video-ondemand, document management, educational services, marketing, intelligent systems, digital libraries, medical information systems, etc. • Issues: Modeling (complex objects), design, storage of large objects (LOBs), compression, retrieval (indexes), performance (critical for audio/ video). AdvDB-8 J. Teuhola 2015 241 Multimedia databases: Required features • Supports the main types of multimedia (MM) data • Can handle a very large number of MM objects • Supports high-performance, high-capacity storage management • Offers DB capabilities: Persistence, transactions, concurrency control, recovery from failures, querying with high-level declarative constructs, versioning, integrity constraints, security. • Offers information-retrieval capabilities: Exact-match retrieval, probabilistic (best-match) retrieval, contentbased retrieval, ranking of results AdvDB-8 J. Teuhola 2015 242 Multimedia databases: Functional considerations • • • • • Interactive querying Relevance feedback Query refinement Automatic feature extraction and indexing Content- and context-based indexing of different media • Single- and multidimensional indexing AdvDB-8 J. Teuhola 2015 243 Multimedia databases: Functional considerations (cont.) • Clustering of media data on storage devices • Support for efficient access of very large media objects • Optimization of multimedia queries and retrieval, supported by sophisticated indexing • Replication, parallelism, distribution, scalability • Recent approach: NoSQL databses, with relaxed requirements of consistency, compared to traditional ACID (see Chapter 3) AdvDB-8 J. Teuhola 2015 244 NoSQL databases • ”Not only SQL” • ”Big Data” applications, e.g. search engines, social media, data streams, observation data • Traditional relational technology does not scale well to huge amounts of data. • Typical of NoSQL systems: – Requirement for very efficient retrieval – Real-time updating can be relaxed – Large-scale distribution is required AdvDB-8 J. Teuhola 2015 245 NoSQL approaches • Key–value stores E.g. DynamoDB (Amazon) • Column stores Eg. BigTable (Google), Cassandra (Apache) • Graph databases E.g. Neo4j (Open-source, Java-based) • Document stores E.g. Native XML databases AdvDB-8 J. Teuhola 2015 246 End of slides – Remember also the exercises! AdvDB-8 J. Teuhola 2015 247