Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
8. Special database types Distributed databases • Distribution of data: – Several host sites. – Availability and reliability: replicated data – Distributed concurrency control • Distribution of users: – Client-server architecture – Web databases; three-tier architecture AdvDB-8 J. Teuhola 2015 223 Distributed databases: Requirements • • • • Replication and partitioning of data Maintenance of a location map for data Query optimization for multiple hosts Maintenance of consistency among replicas after update operations • Recovery from network failures • Partial usability when some hosts are down • Management and control of access rights AdvDB-8 J. Teuhola 2015 224 Distributed databases: Advantages • Improved efficiency by replication: data close to users, preferably in the local host. • Improved reliability by replication: When one host is down, others continue to operate. Data is accessible when one copy is available. • Transparency: The user does not need to know the location of data / replicas / partitions. • Extensibility: new nodes can be added to the network. AdvDB-8 J. Teuhola 2015 225 Example: distributed join • Relation R(X, Y, Z) stored in host A • Relation S(Z, W) stored in host B • Steps of natural join R * S for host A: – – – – Send column R(Z) from A to B Compute semijoin T(Z, W) = R(Z) * S(Z, W) in B Send relation T back to A Compute the final join R * T • Note: the last step can be replaced by concatenation if duplicates are maintained in W and T AdvDB-8 J. Teuhola 2015 226 Deductive (logic) databases Main features: • ‘Data’ consists of facts and rules. • Declarative language to define them • Inference engine = deduction mechanism for solving queries Related areas: • Relational data model (esp. relational calculus) • Logic programming (Prolog) • Datalog: Subset of Prolog AdvDB-8 J. Teuhola 2015 227 Deductive databases: Example in Datalog Facts: parent(x, y) means that y is x’s parent parent(peter,mary). parent(peter,paul). parent(mary,john). parent(paul,joan). Rules: ancestor(x, y) means that y is x’s ancestor ancestor(X,Y) :- parent(X,Y). ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y). Queries: (1) ancestors of Peter, (2) descendants of Joan ?- ancestor(peter,?). ?- ancestor(?,joan). AdvDB-8 J. Teuhola 2015 228 Data warehouses • Support for decision making. • Derived, integrated and refined from operational databases. • No transaction processing, not quite up-to-date. • Multidimensional view of data (data cube) • OLAP = On-Line Analytic processing. • Summary and multidimensional data. • Statistical analysis tools. • Data mining tools. AdvDB-8 J. Teuhola 2015 229 Example: data cube on sales • Sales values per salesman, product and date Salesperson Date Product AdvDB-8 J. Teuhola 2015 230 Example: ‘Star’ schema for data warehouse ProdTable Prod-no Name Descr Group SalesTable ProdNo AreaNo Date Amount Value ‘Fact table’: Sales AreaTable AreaNo Name Seller TimeTable Date DayOfWeek ‘Dimension tables’: Prod, Area Time AdvDB-8 J. Teuhola 2015 231 XML databases: ‘semi-structured data’ • Storage and retrieval of XML documents: structured using nested pairs of tags • Flexible, hierarchical schema • Alternative implementations for XML databases: – Relational database: various alternatives – Object database: more direct mapping of the structure – Native XML database: built from scratch, tailored especially for this data type • Query Language: XQuery AdvDB-8 J. Teuhola 2015 232 Example document collection: 2 courses <?xml version=“1.0”?> <course> <cname>Adv DB</cname> <teacher>Timo</teacher> <audience> <student>Pasi</student> <student>Pirjo</student> </audience> </course> AdvDB-8 <?xml version=“1.0”?> <course> <cname>C++</cname> <teacher>Esa</teacher> <audience> <student>Pasi</student> <student>Pia</student> </audience> </course> J. Teuhola 2015 233 Illustration as tree structures • Course document 1 • Course document 2 course cname teacher Adv DB Timo course audience student student Pasi Pirjo AdvDB-8 cname teacher C++ Esa J. Teuhola 2015 audience student student Pasi Pia 234 Relational alternative 1: XML data type for a column Courses-relation cid course document c1 <?xml…?><course><cname>AdvDB</cname><teacher> Timo</teacher><audience><student>Pasi</student> <student>Pirjo</student></audience></course> c2 <?xml version=“1.0”?><course><cname>C++</cname> <teacher>Esa</teacher><audience> <student>Pasi </student><student>Pia</student></audience></course> AdvDB-8 J. Teuhola 2015 235 Relational alternative 2: Non-typed nodes Nodes-relation node-id n1 n2 n3 n4 n5 n6 n7 n8 … element course cname teacher audience student student course cname … AdvDB-8 parent text-value n1 n1 n1 n4 n4 n7 … Adv DB Timo Pasi Pirjo C++ … J. Teuhola 2015 236 Relational alternative 3: Typed nodes Courses cid cname c1 Adv DB c2 C++ teacher Timo Esa Audience student cid Pasi c1 Pirjo c1 Pasi c2 Pia c2 AdvDB-8 J. Teuhola 2015 237 Digital libraries • Organized collection of information ( web) • Close to multimedia databases, but more focused on information retrieval features • Two types of users: – End users make retrievals – Librarians select, organize and maintain the collection. • Important: Metadata and annotations • Hard job: digitalization of ’real’ libraries AdvDB-8 J. Teuhola 2015 238 Spatial databases • Representations: Solid (2D, 3D), boundary, abstract (‘above’, ‘near’, ‘under’, ...) • Objects: points, line segments, rectangles • Spatial operations (intersection, nearest neighbor, spatial join, ...) • Important application area: GIS = Geographic Information system (objects on maps). • Temporal dimension may be included (movement, order of events) AdvDB-8 J. Teuhola 2015 239 Scientific databases • Large amounts of observed data (raw, calibrated, validated, derived, interpreted) • Updated seldom - transaction processing not needed. • One form of data warehouse. • Metadata is crucial • Example of scientific database: genome and protein data in bioinformatics (sequences, 3D-structures) AdvDB-8 J. Teuhola 2015 240 Multimedia databases • Text, hypertext, images, graphics, audio, video • Applications: Media servers, audio/video-ondemand, document management, educational services, marketing, intelligent systems, digital libraries, medical information systems, etc. • Issues: Modeling (complex objects), design, storage of large objects (LOBs), compression, retrieval (indexes), performance (critical for audio/ video). AdvDB-8 J. Teuhola 2015 241 Multimedia databases: Required features • Supports the main types of multimedia (MM) data • Can handle a very large number of MM objects • Supports high-performance, high-capacity storage management • Offers DB capabilities: Persistence, transactions, concurrency control, recovery from failures, querying with high-level declarative constructs, versioning, integrity constraints, security. • Offers information-retrieval capabilities: Exact-match retrieval, probabilistic (best-match) retrieval, contentbased retrieval, ranking of results AdvDB-8 J. Teuhola 2015 242 Multimedia databases: Functional considerations • • • • • Interactive querying Relevance feedback Query refinement Automatic feature extraction and indexing Content- and context-based indexing of different media • Single- and multidimensional indexing AdvDB-8 J. Teuhola 2015 243 Multimedia databases: Functional considerations (cont.) • Clustering of media data on storage devices • Support for efficient access of very large media objects • Optimization of multimedia queries and retrieval, supported by sophisticated indexing • Replication, parallelism, distribution, scalability • Recent approach: NoSQL databses, with relaxed requirements of consistency, compared to traditional ACID (see Chapter 3) AdvDB-8 J. Teuhola 2015 244 NoSQL databases • ”Not only SQL” • ”Big Data” applications, e.g. search engines, social media, data streams, observation data • Traditional relational technology does not scale well to huge amounts of data. • Typical of NoSQL systems: – Requirement for very efficient retrieval – Real-time updating can be relaxed – Large-scale distribution is required AdvDB-8 J. Teuhola 2015 245 NoSQL approaches • Key–value stores E.g. DynamoDB (Amazon) • Column stores Eg. BigTable (Google), Cassandra (Apache) • Graph databases E.g. Neo4j (Open-source, Java-based) • Document stores E.g. Native XML databases AdvDB-8 J. Teuhola 2015 246 End of slides – Remember also the exercises! AdvDB-8 J. Teuhola 2015 247