Download Coordinating Peer-to-Peer Information Sources

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Relational algebra wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Versant Object Database wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Coordinating Peer-to-Peer
information sources
Fausto Giunchiglia, University of Trento
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
1
The talk
 Intuitions
 The underlying theory: The Local Relational Model (an
application of the Local Models Semantics [Ghidini and
Giunchiglia, AIJ 2001])
 Some theoretical results
 VERY PRELIMINARY logical architecture
 … and agents?
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
2
INTUITIONS
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
3
Peer to Peer (P2P) Computing
 Peers come and go, but must nevertheless be able to
interoperate.
 There are many examples outside the database field
 Napster – a shared directory of available music
and client software to read/write the directory and
import/export files.
 Gnutella – a decentralized group membership and
search protocol, mainly used for file sharing.
 Groove – a secure shared space among intermittantly connected systems with no central server
…
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
4
Is There a Role for P2P Databases ?
 There’s hardly any literature
 WebDB ’01 paper (Gribble, Halevy, Ives, Rodrig,
Suciu) focuses on data placement
 This implies some control over data placement
 They’re serious about building a system (“Piazza”)
 Is it a really new research problem? Or only a new
application with a lot of hype around it?
 Compare it with the work on data integration (Local-asview, global-as-view approaches). Can’t we just apply
the same techniques?
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
5
Data integration: a snapshot
 Global schema (defined at design time).
 Integration defined at design time by mapping local data bases
into global data base
 Global schema as primitive (LAV: local-as-view), or local schemas
as primitive (GAV: Global-as-view)
 In all cases: take one domain of interpretation (as implicitly defined
by the global schema) and MAP all individuals, relations and
attributes of databases to integrate into it
 Want correctness (query containment)
 But:
 What if a new node comes in?
 Can we really deal with completely autonomous nodes?
 What about autonomy at run time (change schema?)
 ….
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
6
Coordinating P2P databases: is it a new
research problem? - 1 Domain Characteristics:
 Autonomy: peer databases are largely independent (in
their language, contents, in how they answer queries
…). They may be incomplete, overlapping, semantically
heterogeneous, mutually inconsistent, ..
 Dinamicity: nodes come and go … and maybe come
again …, schemas, attributes, values may change over
time, …
 You know something about the peer databases. Almost
never you know everything. This knowledge is hard to
maintain and may be obsolete
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
7
Is it a new research problem? - 2 Solution desiderata:
 Need scalability over number of nodes
 Want “incrementality” as a function of the effort made in
developing a solution (design time) and in getting
“good” answers (runtime)
 (Design or run time) correctness and completeness
should be limit cases (most of the time too costly to be
implemented)
 Want robustness with respect to autonomy of peer
databases
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
8
Is it a new research problem? - 3 Solution characteristics:
 Keep autonomy, add coordination, as much as it can be afforded
(see incrementality)
 Notion of good enough answer, as a function of coordination effort
NOTE: Coordination is NOT (data) integration.
 Integration is defined once for all at design time. Coordination
may change at run time
 Differently from data integration, there is no global schema. By
the way, what is a global schema in the P2P domain? How
much are we willing to pay to approximate it … and maintain it
in time?
 …
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
9
The Local Relational Model
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
10
A Motivating Example – 1 -
 Scenario
 Databases of medical patients
 Complete integration is likely to be infeasible
 But dynamic integration of databases relevant to
one patient could have high value.
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
11
A Motivating Example –2  Consider 3 databases, one table per DB:
 f: family doctor
f:Prescription(PatID,treatment,disease)
 p: pharmacist
p:Medication(PID,Prod,PrescriptionID)
 h: hospital
h:Patients(PATid,disease,in,out)
 A given patient may be described in all 3 databases
 But the databases might use different patient id
formats and disease descriptions.
 When a patient is injured on a ski holiday in another
country, yet more databases need to get involved.
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
12
Domain Relations
 Each database DBi: its language Li, with a set Ai of unary
predicates for Attributes, a set of constant symbols DOMi for
Elements, a set of predicates Ri for Relations
 Take a set of such DBi, i in I
 Define Domain relation rij as a subset of DOMi x DOMj. rij is
the set of pairs <di, dj> where, intuitively di and dj (usually
different constants) stand for the same object in the world
 Each row <d1,d2> in domain relation rik specifies that value d1 in
database i corresponds to value d2 in database k
 Clearly, it’s a simplification to have one domain per database.
This is just for notational convenience.
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
13
A Motivating Example – 3 Consider previous 3 databases,
 f: family doctor
f:Prescription(p12,Aspirin,Headache)
 p: pharmacist
p:Medication(31, Aspirin-Bayer,fd23)
 h: hospital
h:Patients(r3,car_accid,1/1/01,3/1/01)
 We may have:
 <r3,p12> in rhf
 <31,p12> in rpf
 <p12,r3> in rfh, if we have inverse mapping
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
14
Domain relations … more
 …. Suppose we have:
 <r3,p12> in rhf
 <31,p12> in rpf
 <p12,r3> in rfh
…
 NOTE: We do not collapse local domains in the
universal domain (as in data integration). We keep
them distinct, and introduce mappings between pairs
of domains as objects. Domain relations explicitly
manipulated at run time to implement coordination
between peer databases.
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
15
Domain Relations – Examples
 rij may be partial and not surjective (most often the case)
 rij, rji need not be symmetric: rij (rji(x))  x. For example, consider DBi
containing length measurements in meters and DBj in kilometers. One
can have
 rij(x) = roundToClosestK(x),
e.g., rij(653)=1, rij(453)=0
 rji(x) = x*1000
e.g., rji(1)=1000
 rij= inverse(rji) : different but equivalent representations of same domain
 rij= rji = emptyset : disjoint domains (what if only one being emptyset?)
 rik=(rij composed rjk) : transitive mappings among domains
 rij(ds)= emptyset, with ds subset of di: keep ds secret
 d1,d2 in DOMi,d1<I d2 
d1’in rij(d1),d2’in rij(d2).
(d1’<j d2’):preserving order (currency exchange)
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
16
P2P Coordination
 Instead of a global schema, assume each peer has
 pair-wise coordination fomulas that specify
interdependencies.
 binary domain relations that specify how the
symbols used in one database translate to
symbols used in another database.
 Coordination formulas and domain relations can only
refer to acquaintances.
 Use domain relations and coordination formulas for
query and update processing.
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
17
Coordination Formulas – Examples
(p:x). (p:y).(p: (z).medication(x,z,y) 
f: treatments(x, home, y) )
(h:x).(h:y).(h:(z1,z2).patient(x,y,z1,z2) 
f: treatments(x, hospital, y) )
“There’s a row in the treatments table in the family doctor database
for each row in the patient and hospital databases”
NOTE: see indexing of formulas and variables
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
18
Coordination formulas
 Coordination formulas are built from atomic formulas i:f(x),where
f(x) is a First Order formula, and using standard connectives: and,
or, , , .
 Variables quantified on one DB may have to be interpreted on other
DBs. Mapping is done exploiting domain relations. Consider, eg.:
 (i:x).j:P(x)
“for each object di in DOMi, the corresponding object dj =rij(di) in
DOMj has the property P”
 (i:x).(i:P(x)  j:Q(x) and k:R(x))
“for each object di in DOMi, if P holds of di …
 Quantification is always done with respect to the domain of one
database. However notice difference between
 (i:x).A(x),with A(x)a coordination formula
 i: x.B(x), with B(x)a first order formula. It holds iff
(i:x). i:B(x) holds
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
19
Higher Level Correspondences
 One can generalize the domain relation to
correspondences at higher meta-levels
 constant to constant,
e.g., ‘one’  ‘uno’; or
CAN$1.00  US$0.65
 table to table,
e.g., Cust  Customer
 column to column,
e.g., name(Cust)  nm(Customer)
 This is also captured in coordination formulas.
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
20
Answering Queries
 Local queries. Treated as if there exist no peer databases. They
are first order formulas of the form
A(x)  q(x)
with A(x) a first order formula, x and q as below
 Global queries. They are coordination formulas of the form
A(x)  i: q(x)
 where
 A(x) is a coordination formula
 x has n variables
 q is a new n-ary predicate symbol
 i is the database which gets the query
 The answer to a global query is
{ddomin such that (i:x).A(x)  i:x=d)}
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
21
Answering Queries – An example
 Consider the query below, submitted to database h:
((i:P(x)  j:R(y))  k:S(x,y) )  h: q(x,y)
 Three steps:
1. Evaluate P,R,S in i,j,k (respectively)
2. map results via rih,rjh,rkh to sets si,sj,sk
and then
3. compute ((si  sj)  sk)
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
22
SOME THEORETICAL RESULTS
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
23
Theoretical Results – 1  Provide a model theory by defining the Local
Relational Model in terms of Relational spaces,
where a relational space is defined as a pair:
<set of local databases, set of pairwise domain relations>
 Provide a notion of satisfiability and logical
consequence of coordination formulas with respect to
relational frames
 Provide inference rules for using coordination
formulas.
 Prove them sound and complete with respect to the
LRM.
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
24
Theoretical Results – 2  Define a generalized relational theory as a theory
with domain closure, distinct domain values, and
finite number of possible relation extensions (closed
world assumption).
 Define relational multi-context system <T,R> as a
family of relational languages (one per database) with
a generalized relational theory (in T) and set of
coordination formulas (in R).
 Prove that for any relational multi-context system,
there’s a unique maximal relational space that
satisfies it. (Generalizes Reiter’s result on CWA and
single databases.)
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
25
Theoretical Results – 3 Given a multi-context system <T,R> that represents it, the
answer to a query
A(x)  i: q(x)
is the set of all d such that
{i:Ti}iI,R
|-
(i:x).A(x)  i:x=d)
This result is the basis for a correct and complete query answering
mechanism (for a given set of coordination formulas … which
may implement something totally different from the data
integration approach (LAV, GAV))
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
26
VERY PRELIMINARY HINTS OF A LOGICAL
ARCHITECTURE
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
27
A proposed architecture (prelim.) –1Four basic ingredients
1. Interest Group: set of nodes being able to answer
queries about a certain topic (e.g., Tourism, medical
care). Needed to compute scope of query answering
2. Acquaintance (with respect to a node and a given
query): a node which is supposed to have information
that can be used to answer the query
3. Coordination rule (with respect to an acquaintance): it
says how to propagate query forward and results back
4. Correspondence rule (with respect to an
acquaintance): it takes care of semantic heterogeneity
problem.
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
28
A proposed architecture (prelim.) –2From theory to practice
1. Interest Group: In LRM is the set of databases in a
relational frame
2. Acquaintance (of a node n1): In LRM any node n2 for
which there is a coordination formula involving n1 and
n2
3. Coordination rule: An implementation of coordination
formulas, parametric on correspondence rules.
4. Correspondence rule : A set of rewrite rules which
implement the language dependent part of
coordination formulas and take care of semantic
heterogeneity (domain relations are implemented as
special kinds of correspondence rules).
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
29
Level 1 architecture – The P2P layer
 P2P Layer
 P2P functionality is add-on
 Local Data Source




Database
File system
Web site
…
 User Interface
 User queries
 Results
 …
 Query Manager and
Update Manager
 responsible for query and
update propagation
 manage coordination and
correspondence rules,
acquaintances, and interest
groups
 Wrapper
 provides a translation layer
between QM and UM, and
LDS
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
30
Level 2 architecture – The Query manager
 Propagation Planner
 Talks to group-manager
 Query Formation
 Responsible of formation of outgoing
queries, as well as querying the local
data source
 Results Handler
 Responsible for sending and receiving
query results;
 Shows results to user
 Executed Query History
 Preventing from duplicate query
execution
 Acquaintances
 Interest Groups
 Group Management
 Used only by node-managers for
management of groups and query
propagation
 Coordination and Correspondence
Rules
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
31
Query propagation strategy
1. Node defines query topic
2. Node sends Group Manager (GM)
request of Query Scope (QS)
3. GM computes QS
4. Node 1 sends query to acquaintances,
in QS, namely 2 and 4, and reports
this fact to GM.
5. Nodes 2 and 4 send answer to node 1
6. Node 2 propagates query to its
acquaintances in QS, namely 4 and 6,
and reports this fact to GM
7. And so on…
8. Nodes which do not propagate any
further, report this fact to GM
9. Propagation stops when “no more
propagation” received from all
boundary nodes (reached all
reachable acquaintances).
AOIS’02 - June 02, 2002
GM
4. QS (, topic)= (2, 4, 6, 8, 9, 11)
9
6
2
2. Q (, topic)
1. Q ()
10
7
1
←Res4
4
11
3
5
8
Coordinating Peer-to-Peer information sources
32
Summary
 Coordinating P2P information sources: keep
autonomy, add (run-time) coordination. Be content
with good enough answers.
 Theoretically, model coordination using four notions:
set of local databases, domain relations, coordination
formulas, global answer to a query
 Implementationally, implement coordination using five
notions: interest groups, acquaintances, coordination
rules, correspondence rules, coordination algorithm
 … and agents?
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
33
Published work (not much … yet)
 Paper on LRM still unpublished, but see project Web
page
 Paper on basic ideas in WEBDB 2002
 Paper on architecture in CIA 2002
 These slides soon on my Web page
Project Web page (to be put up soon) will be accessible
from my Web page:
http://www.ict.unitn.it/~fausto/
AOIS’02 - June 02, 2002
Coordinating Peer-to-Peer information sources
34