Download 10. Deductive databases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Database model wikipedia , lookup

Transcript
8. Special database types
Distributed databases
• Distribution of data:
– Several host sites.
– Availability and reliability: replicated data
– Distributed concurrency control
• Distribution of users:
– Client-server architecture
– Web databases; three-tier architecture
AdvDB-8
J. Teuhola 2015
223
Distributed databases: Requirements
•
•
•
•
Replication and partitioning of data
Maintenance of a location map for data
Query optimization for multiple hosts
Maintenance of consistency among replicas
after update operations
• Recovery from network failures
• Partial usability when some hosts are down
• Management and control of access rights
AdvDB-8
J. Teuhola 2015
224
Distributed databases: Advantages
• Improved efficiency by replication: data close
to users, preferably in the local host.
• Improved reliability by replication: When one
host is down, others continue to operate. Data
is accessible when one copy is available.
• Transparency: The user does not need to know
the location of data / replicas / partitions.
• Extensibility: new nodes can be added to the
network.
AdvDB-8
J. Teuhola 2015
225
Example: distributed join
• Relation R(X, Y, Z) stored in host A
• Relation S(Z, W) stored in host B
• Steps of natural join R * S for host A:
–
–
–
–
Send column R(Z) from A to B
Compute semijoin T(Z, W) = R(Z) * S(Z, W) in B
Send relation T back to A
Compute the final join R * T
• Note: the last step can be replaced by
concatenation if duplicates are maintained in
W and T
AdvDB-8
J. Teuhola 2015
226
Deductive (logic) databases
Main features:
• ‘Data’ consists of facts and rules.
• Declarative language to define them
• Inference engine = deduction mechanism for
solving queries
Related areas:
• Relational data model (esp. relational calculus)
• Logic programming (Prolog)
• Datalog: Subset of Prolog
AdvDB-8
J. Teuhola 2015
227
Deductive databases: Example in Datalog
Facts: parent(x, y) means that y is x’s parent
parent(peter,mary).
parent(peter,paul).
parent(mary,john).
parent(paul,joan).
Rules: ancestor(x, y) means that y is x’s ancestor
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z),ancestor(Z,Y).
Queries: (1) ancestors of Peter, (2) descendants of Joan
?- ancestor(peter,?).
?- ancestor(?,joan).
AdvDB-8
J. Teuhola 2015
228
Data warehouses
• Support for decision making.
• Derived, integrated and refined from operational
databases.
• No transaction processing, not quite up-to-date.
• Multidimensional view of data (data cube)
• OLAP = On-Line Analytic processing.
• Summary and multidimensional data.
• Statistical analysis tools.
• Data mining tools.
AdvDB-8
J. Teuhola 2015
229
Example: data cube on sales
• Sales values per salesman, product and date
Salesperson
Date
Product
AdvDB-8
J. Teuhola 2015
230
Example: ‘Star’ schema for data warehouse
ProdTable
Prod-no
Name
Descr
Group
SalesTable
ProdNo
AreaNo
Date
Amount
Value
‘Fact table’: Sales
AreaTable
AreaNo
Name
Seller
TimeTable
Date
DayOfWeek
‘Dimension tables’: Prod, Area Time
AdvDB-8
J. Teuhola 2015
231
XML databases: ‘semi-structured data’
• Storage and retrieval of XML documents:
structured using nested pairs of tags
• Flexible, hierarchical schema
• Alternative implementations for XML
databases:
– Relational database: various alternatives
– Object database: more direct mapping of the
structure
– Native XML database: built from scratch, tailored
especially for this data type
• Query Language: XQuery
AdvDB-8
J. Teuhola 2015
232
Example document collection: 2 courses
<?xml version=“1.0”?>
<course>
<cname>Adv DB</cname>
<teacher>Timo</teacher>
<audience>
<student>Pasi</student>
<student>Pirjo</student>
</audience>
</course>
AdvDB-8
<?xml version=“1.0”?>
<course>
<cname>C++</cname>
<teacher>Esa</teacher>
<audience>
<student>Pasi</student>
<student>Pia</student>
</audience>
</course>
J. Teuhola 2015
233
Illustration as tree structures
• Course document 1
• Course document 2
course
cname
teacher
Adv DB
Timo
course
audience
student
student
Pasi
Pirjo
AdvDB-8
cname
teacher
C++
Esa
J. Teuhola 2015
audience
student
student
Pasi
Pia
234
Relational alternative 1:
XML data type for a column
Courses-relation
cid
course document
c1
<?xml…?><course><cname>AdvDB</cname><teacher>
Timo</teacher><audience><student>Pasi</student>
<student>Pirjo</student></audience></course>
c2
<?xml version=“1.0”?><course><cname>C++</cname>
<teacher>Esa</teacher><audience> <student>Pasi
</student><student>Pia</student></audience></course>
AdvDB-8
J. Teuhola 2015
235
Relational alternative 2:
Non-typed nodes
Nodes-relation
node-id
n1
n2
n3
n4
n5
n6
n7
n8
…
element
course
cname
teacher
audience
student
student
course
cname
…
AdvDB-8
parent text-value
n1
n1
n1
n4
n4
n7
…
Adv DB
Timo
Pasi
Pirjo
C++
…
J. Teuhola 2015
236
Relational alternative 3:
Typed nodes
Courses
cid
cname
c1
Adv DB
c2
C++
teacher
Timo
Esa
Audience
student cid
Pasi
c1
Pirjo
c1
Pasi
c2
Pia
c2
AdvDB-8
J. Teuhola 2015
237
Digital libraries
• Organized collection of information ( web)
• Close to multimedia databases, but more
focused on information retrieval features
• Two types of users:
– End users make retrievals
– Librarians select, organize and maintain the collection.
• Important: Metadata and annotations
• Hard job: digitalization of ’real’ libraries
AdvDB-8
J. Teuhola 2015
238
Spatial databases
• Representations: Solid (2D, 3D), boundary,
abstract (‘above’, ‘near’, ‘under’, ...)
• Objects: points, line segments, rectangles
• Spatial operations (intersection, nearest
neighbor, spatial join, ...)
• Important application area:
GIS = Geographic Information system
(objects on maps).
• Temporal dimension may be included
(movement, order of events)
AdvDB-8
J. Teuhola 2015
239
Scientific databases
• Large amounts of observed data (raw,
calibrated, validated, derived, interpreted)
• Updated seldom - transaction processing not
needed.
• One form of data warehouse.
• Metadata is crucial
• Example of scientific database: genome and
protein data in bioinformatics (sequences,
3D-structures)
AdvDB-8
J. Teuhola 2015
240
Multimedia databases
• Text, hypertext, images, graphics, audio, video
• Applications: Media servers, audio/video-ondemand, document management, educational
services, marketing, intelligent systems, digital
libraries, medical information systems, etc.
• Issues: Modeling (complex objects), design,
storage of large objects (LOBs), compression,
retrieval (indexes), performance (critical for
audio/ video).
AdvDB-8
J. Teuhola 2015
241
Multimedia databases: Required features
• Supports the main types of multimedia (MM) data
• Can handle a very large number of MM objects
• Supports high-performance, high-capacity storage
management
• Offers DB capabilities: Persistence, transactions,
concurrency control, recovery from failures, querying
with high-level declarative constructs, versioning,
integrity constraints, security.
• Offers information-retrieval capabilities: Exact-match
retrieval, probabilistic (best-match) retrieval, contentbased retrieval, ranking of results
AdvDB-8
J. Teuhola 2015
242
Multimedia databases: Functional
considerations
•
•
•
•
•
Interactive querying
Relevance feedback
Query refinement
Automatic feature extraction and indexing
Content- and context-based indexing of
different media
• Single- and multidimensional indexing
AdvDB-8
J. Teuhola 2015
243
Multimedia databases: Functional
considerations (cont.)
• Clustering of media data on storage devices
• Support for efficient access of very large
media objects
• Optimization of multimedia queries and
retrieval, supported by sophisticated indexing
• Replication, parallelism, distribution,
scalability
• Recent approach: NoSQL databses, with
relaxed requirements of consistency,
compared to traditional ACID (see Chapter 3)
AdvDB-8
J. Teuhola 2015
244
NoSQL databases
• ”Not only SQL”
• ”Big Data” applications, e.g. search engines,
social media, data streams, observation data
• Traditional relational technology does not
scale well to huge amounts of data.
• Typical of NoSQL systems:
– Requirement for very efficient retrieval
– Real-time updating can be relaxed
– Large-scale distribution is required
AdvDB-8
J. Teuhola 2015
245
NoSQL approaches
• Key–value stores
E.g. DynamoDB (Amazon)
• Column stores
Eg. BigTable (Google), Cassandra (Apache)
• Graph databases
E.g. Neo4j (Open-source, Java-based)
• Document stores
E.g. Native XML databases
AdvDB-8
J. Teuhola 2015
246
End of slides –
Remember also the exercises!
AdvDB-8
J. Teuhola 2015
247