Download A Data model for Multidatabases

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Versant Object Database wikipedia , lookup

ContactPoint wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
A Data Model for Multidatabases:
Don’t Integrate, Coordinate!
John Mylopoulos
Department of Computer Science
University of Toronto
Luciano Serafini and Fausto Giunchiglia
Department of Computer Science
University of Trento
DRAFT December 17, 2000
The Multidatabase Model -- 1
A Motivating Example
 Consider a company database:
Cust(name,addr,phone)
Sales(custName,prod#,price,amount,date)
Prod(prod#,name,price,inStock)
 Each salesperson leaving for a trip downloads parts of the
Prod, Sales and Cust relations. On their trip, they
update customer, and sales information.
 Each of these databases evolves autonomously from the
original, and there is no global manager. However, we’d
like to enforce coordination rules, such as:
“Updates to a customer address must be propagated to
other databases”
DRAFT December 17, 2000
The Multidatabase Model -- 2
Managing Data in Multidatabases
 Within such a context, it makes sense to assume that the
databases participating in a multidatabase coalition need
not be connected for part of the time.
Unavailability is not a failure, but a fact of life
 Nevertheless, we’d like to be able to perform some forms
of (soft) constraint enforcement as well as weak forms of
distributed query processing.
 In addition, we’d like to make sure that our model is
founded on a logical foundation, much like the Relational
Model (e.g., [Reiter84]).
DRAFT December 17, 2000
The Multidatabase Model -- 3
Outline
 The rest of the talk covers the following topics:
 Coordination rules;
 Correspondence rules and query processing;
 A formal semantics for the multidatabase problem;
 Directions for further research.
DRAFT December 17, 2000
The Multidatabase Model -- 4
Coordination Rules
 These are soft inter-database constraints. They are checked
every time there is an update to one of the relevant
databases, and are enforced through some protocol; for
example,
 Master:Cust(n,x) and Paolo:Cust(n,y)
--> x=y)
propagate last
/* the latest addr is propagated to the other database */
 Master:Prod(p#,p) and Paolo:Prod(p#,p’)
--> p=p’) propagate (Master->Paolo)
/* the Master copy prices are always propagated to the
other databases when there is a discrepancy, not the other
way around */
DRAFT December 17, 2000
The Multidatabase Model -- 5
Expressions are Relative
 All expressions appearing in a coordination rule are
relative to one of the participating databases, e.g.,
Master:Cust(n,x), Paolo:Cust(n,y)
 Expressions with no associated database are shorthands,
e.g., for the rule
M:Cust(n,x) and P:Cust(n,y) --> x=y)
‘x=y’ is a shorthand for ‘M:x=y and P:x=y’
DRAFT December 17, 2000
The Multidatabase Model -- 6
More Coordination Rules
 Luciano:TravelB=x and Paolo:TravelB=y and
Fausto:TravelB=z --> x+y+z=15MLit
equi-distribute
/* if their total budget is x > 15MLit, reduce each budget by (x 15ML)/3
*/
 Master:Prod(p#,n) and Paolo:Prod(p#,n’)
--> n=n’ undo
/* no updates allowed to product names
*/
 Master:Prod(p#,.) == Paolo:Prod(p#,.)
propagate (Master->Paolo)
/* propagation of values always proceeds from the Master to
Paolo’s database
*/
DRAFT December 17, 2000
The Multidatabase Model -- 7
Restoration Modality
 The last part of each coordination rule (in red in the
examples of the previous slides) describes the restoration
modality of the rule, I.e., the means by which the rule will be
enforced if it is violated.
 For example,
 Propagate last -- propagate the last update;
 Undo -- undo last update;
 Equi-distribute -- restore a particular numerical sum
constraint by reducing each of the participating variables;
 Propagate (A -> B) -- update B to make it consistent
with A;
 ...
DRAFT December 17, 2000
The Multidatabase Model -- 8
Enforcement Protocol
 The enforcement protocol can be characterized along (at
least) two dimensions:
 When are they enforced (ASAP, periodically, ...)
 What force do they have over participating databases
(constraints, guidelines, suggestions,...)
 The protocol includes an optional restoration action, but
also one or more followup actions, e.g., amend a
coordination rule, delete a rule, add more rules,...
 Important to stress that since the databases are assumed to
be autonomous, the enforcement of any coordination rule
may be refused in particular cases, and/or the rule may be
amended to reflect a new coordination arrangement.
DRAFT December 17, 2000
The Multidatabase Model -- 9
Acquaintances
 Each database has zero or more acquaintances; these
are other databases with which it can share coordination
rules.
 Each database keeps track of its acquaintances through
the Acq relation:
Acq(name,eAddr,owner,startDate)
 In general, there is no central coordination, no global
schema, and no database knows all the participating
databases,
DRAFT December 17, 2000
The Multidatabase Model -- 10
Coordination Rules, Again
 Each coordination rule is expressed locally using the
database names found in the Acq relation.
 Since databases may have different names in different
databases, the same coordination rule will have slightly
different forms depending on the database with respect to
which it is expressed.
 Coordination rules for a given database are stored in the
CR relation:
CR(name,expr,startDate)
DRAFT December 17, 2000
The Multidatabase Model -- 11
Cooperation Rules
 These are coordination rules involving the two special
relations Acq and CR.
 For example, we can say that A and B have the same
acquaintances:
 A:Acq(x,y,z) == B:Acq(x,y,z)
propagate (B -> A)
or, that A has all the coordination rules of B
 A:CR(nm,e) == B:CR(nm,e)
propagate last
DRAFT December 17, 2000
The Multidatabase Model -- 12
Correspondences
 A correspondence relation specifies how the symbols
used in one database translate to symbols used in
another database.
 A correspondence relation is defined at three levels:
 Constant to constant,
e.g., ‘one’ --> ‘uno’; or
CAN$1.00 --> US$0.65
 Relation to relation, e.g., Cust --> Customer
 Relational attribute to relational
name(Cust) --> nm(Customer)
DRAFT December 17, 2000
attribute,
e.g.,
The Multidatabase Model -- 13
Correspondence Rules
 Correspondences between databases i and j are defined
in terms of two possibly multi-valued and/or partial functions
rij, rji.
 Note that rij, rji need not be symmetric,
i.e., rij(rji(d)) = x
(...the “change bureau” phenomenon...)
 For
example,
consider
DBi
containing
length
measurements in meters and DBj in kilometers. One can
have
 rij(x) = roundToClosestK(x),
e.g., rij(653)=1, rij(453)=0
 rji(x) = x*1000,e.g., rji(1)=1000
DRAFT December 17, 2000
The Multidatabase Model -- 14
Local vs. Global Queries
 Local queries are evaluated by the DBMS managing the
local database.
 A local-global query expressed in DBi involves only terms
used in Li, but will be translated using correspondence
rules so that it can be evaluated with respect to DBi, and
all the databases in the transitive closure of the
acquaintance relationship.
 A global-local query involves a general wff which mentions
several dtabases. Evaluation of such a query proceeds by
evaluating each local query of the form i:f(x1,...,xn)
with respect to DBi.
 Finally, global-global queries involve a general
expressions where each local query is to be evaluated as
a local-global query.
DRAFT December 17, 2000
The Multidatabase Model -- 15
Query evaluation
 Query evaluation is done in different modes:
 Immediately;
 ASAP -- as soon as all relevant databases are
connected (may be a long wait...)
 At time T -- evaluation is done at a particular time;
 Subscriptively -- query is evaluated periodically
 Global queries are obviously harder to evaluation in the
absence of warrantees for connectivity among the
multidatabases.
DRAFT December 17, 2000
The Multidatabase Model -- 16
A Data Model for Multidatabases
 A Multidatabase system consists of one or more databases
and a set of coordination rules.
 Available operations include:
 Add or delete a database (as an acquaintance);
 Update or query a database;
 Add, delete or update a coordination rule.
 Each database shares coordination and correspondence
rules only with databases it is acquainted with.
DRAFT December 17, 2000
The Multidatabase Model -- 17
Formal Semantics for MDM
 A model in Local Model Semantics (hereafter LMS) is a pair
MDB = <{DBi},{rij}ij>, where
 DBi is a relational database (a la [Reiter84]) over
schema Li; the domain of values of the database, Domi,
is assumed to be finite.
 rij is a binary relation over DomiDomj which defines
correspondences of values in the domains of databases
DBi and DBj.
DRAFT December 17, 2000
The Multidatabase Model -- 18
Local Satisfiability
 MDB |= i:f iff DBi |= f
 A Local Query on a database i is an open formula
f(x1,...,xn) on the language of Li.
 Result of a Local Query i:f(x1,...,xn) is the set of
tuples (d1,...,dn)  Domi x ... x Domi such that
DBi |= f (d1,...,dn)
DRAFT December 17, 2000
The Multidatabase Model -- 19
Global Formulas
 You can build them up by using local formulas of the form
i:f(x) and the inter-database connectives and, or,
not,
-->
(implication),
foralli,
existsi(quantifiers on the domain Domi)
 Note that these are different from the local database
connectives. For instance:
 A --> B is not logically equivalent to not(A) or B;
 Quantification is always done with respect to the
domain of one database and we write
foralli x A(x)
DRAFT December 17, 2000
The Multidatabase Model -- 20
Satisfiability for and and forall
 MDB |= A and B iff MDB |= A and MDB |= B
 MDB |= forallix.A(x) iff
for all d  Domi MDB |= A[x/d i]
 where A[x/d i] is obtained by substituting each free
occurrence of x in A in the context j with rji(d)
 Note that if A(x) contains expressions local to a database
other than DBi, then these expressions have to be
satisfiable wrt all local values that correspond to values of
Domj i.e.,
MDB |= foralli x j:f(x) iff
for all b  Domj such that b  rij(a) for some
a  Domi, DBj |= f(b)
DRAFT December 17, 2000
The Multidatabase Model -- 21
A Shorthand Notation
 In the coordination rules shown earlier, quantifications are
not local to a database. As before, we will interpret this to
mean that the quantification is true in every database
mentioned within its scope, i.e., M:C(n,x) and
P:C’(n,y) --> x=y)
means
forall M:n,x,y
[M:C(n,x) and P:C’(n,y) --> x=y) and
forall P:n,x,y
[M:C(n,x) and P:C’(n,y) --> x=y)
 Of course, if the domains involved are isomorphic, this
expansion is not necessary.
DRAFT December 17, 2000
The Multidatabase Model -- 22
Proof Theory
 To be added by Luciano
DRAFT December 17, 2000
The Multidatabase Model -- 23
Soundness and Completeness
 State the S&C theorem, also the theorem that generalizes
Reiter’s result.
DRAFT December 17, 2000
The Multidatabase Model -- 24
Related Work
 There has been much related research on replicated
databases, I.e., distributed databases which include some
replication of data on different nodes of the distributed
system.
 A distributed, replicated database is coherent if the
replicated data are consistent at all times.
 There are many proposals for distributed, replicated
database control with relaxed coherency.
 Relaxed coherency means that replicated data are
allowed to diverge temporarily (bounded relaxed
coherency), or possibly forever (unbounded relaxed
coherency).
DRAFT December 17, 2000
The Multidatabase Model -- 25
Relaxed Coherency Schemes
 Update (preferrably all) copies
 ROWA -- read one, write all;
 ROWA available -- read one, write available nodes
 Update selected copies
 Primary site -- stores master copy;
 Quorum protocols -- pick a subset of nodes to be
updated, read from several nodes;
 Bounded coherence -- update eventually;
 Epidemic algorithms -- propagate updates through a
spreading activation algorithm.
[Ceri91], [Beuter96], [Nicola99]
DRAFT December 17, 2000
The Multidatabase Model -- 26
Discussion
 Much of this work is relevant to the implementation of the
proposed Multidatabase model.
 The key difference of our proposal is that it is based on a
local notion of inconsistency, assumes no global schema, no
global coordination, and treats coordination rules as soft
constraints of variable force.
 Moreover, data replication is a special case of situations
where coordination is useful.
DRAFT December 17, 2000
The Multidatabase Model -- 27
Research Problems
 A formal semantics to the Multidatabase Model, as
sketched out in the previous slides.
 Efficient global query processing techniques (exploiting
parallelism.)
 A formal transaction model for coordination rules,
supporting ‘soft’ enforcement mechanisms.
 Efficient implementation techniques for enforcing
coordination rules and using a range of protocols.
 Extend all of the above to multidatabases which involve
data soutces other than relational databases (such as
OODBs, websites,...)
 ...more...
DRAFT December 17, 2000
The Multidatabase Model -- 28
References
 [Beuter96] Beuter, T., Dadam, P., “Principles of Replication Control in
Distributed Database Systems”, Informatik Forschung und Technik 11(4), 203212, 1996, (in German).
 [Ceri91] Ceri, S., Houtsma, M., Keller, A., Samarati, P., “A Classification of
Update Methods for Replicated Databases”, Stanford University, technical
report STAN-CS-91-1392, October 1991.
 [Nicola99] Nicolas, M., Performance Evaluation of Distributed Replicated, and
Wireless Information Systems, Dissertation RWTH Aachen, report no. 99-10,
Fachgruppe Informatik, RWTH Aachen, 1999.
 [Reiter84] Reiter, R., “A Logical Reconstruction of the Relational Model”, in
brodie, M., Mylopoulos, J., Schmidt, J., (eds.) On Conceptual Modelling:
Perspectives from Artificial Intelligence, Databases and Programming
Languages, Springer-Verlag, 1984.
 [Ozsu99] Ozsu, T., Valduriez, P., Principles of Distributed Database Systems,
Prentice Hall, 1999, 2nd Edition.
DRAFT December 17, 2000
The Multidatabase Model -- 29