Download stars - Department of Computer Science and Information Systems

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Graphs and Functions:
Recurring Themes in Databases
Alex Poulovassilis
29th November 2001
Databases
 Databases store information of relevance to a group of users e.g.
•
•
•
•
employees’ personal details, for a Personnel department
employees’ income details, for a Payroll department
details of molecular structure and interaction, for a Drug company
details of TV broadcasts and ratings, for a TV company
29th November 2001
Data models
 The information stored in a database is expressed using a data model
 The binary relational data model is a very simple data model
 In this model, information is represented using entities and binary
relationships between them
 These can be represented as the nodes and edges of a graph
e.g. here is the schema of a ViewingFigures database:
29th November 2001
29th November 2001
Data and Schema
 The schema of a database defines the type and format of the actual
data – it is part of the database’s meta data
 The data in the database conforms to the schema.
 So a fragment of the ViewingFigures data might be:
29th November 2001
29th November 2001
29th November 2001
29th November 2001
The TriStarp Project

The TriStarp research project, led by Prof Peter King from the mid
1980s, aimed to
(1) develop repository technology for binary relational information
(2) develop languages for computing with this kind of information

Mir Derakhshan worked on (1). Carol Small and I worked on (2).
We were supported by CASE studentships from IBM UK Labs, Prof
Geoff Sharman and Norman Winterbottom being our industrial
supervisors
29th November 2001
Computing with Binary Relational Data
 There are two natural candidates for this:
• logic languages - explored by Carol
• functional languages - the topic of my PhD research, resulting in
the FDL language (1990)
29th November 2001
The Logic Approach
• Find all actors who star in programme P205
stars(P205,x?)
stars
stars
Programme
P205
Actor
29th November 2001
x?
The Logic Approach
• Find all programmes in which Kevin Bacon stars
stars(p?,’Kevin Bacon’)
stars
stars
Programme
p?
Actor
29th November 2001
Kevin
Bacon
The Logic Approach
• Find all actors who have starred with Kevin Bacon
stars(p?,’Kevin Bacon’),stars(p?,x?)
stars
p?
stars
stars
Programme
Actor
Kevin
Bacon
x?
29th November 2001
The Functional Approach
The functional approach interprets binary relationships as functions,
leading to the so-called functional data model
Programme
stars
Actor
inv_stars
29th November 2001
The Functional Approach
• Find all actors who star in programme P205
stars P205
Programme
stars
Actor
inv_stars
29th November 2001
The Functional Approach
• Find all programmes in which Kevin Bacon stars
inv_stars ’Kevin Bacon’
Programme
stars
Actor
inv_stars
29th November 2001
The Functional Approach
Find all actors who have starred with Kevin Bacon
[x | pinv_stars ’Kevin Bacon’; xstars p]
Programme
stars
Actor
inv_stars
29th November 2001
More complex queries
Find the most popular programme showing at 10pm on 1st November, 2001:
let maxViewers = max [viewers s | s  inv_date (1,11,2001);
(start s) <= 2200; (end s) > 2200] in
[of s | s  inv_viewers maxViewers]
29th November 2001
Derived Functions
Find the most popular programme showing at time t on date d:
mostPopular t d =
let maxViewers = max [viewers s | s  inv_date d;
(start s) <= t; (end s) > t] in
[of s | s  inv_viewers maxViewers]
29th November 2001
Recursive functions
Find actors linked to Kevin Bacon via any number of edges labelled
stars:
linkedTo [‘Kevin Bacon’]
where:
linkedTo result = let new = [x | y  result;
p  inv_stars y;
x  stars p] in
linkedTo
if (subset new result)
stars
then result
Programme
Actor
else linkedTo (new U result)
29th November 2001
Oracle of Bacon at Virginia
www.cs.virginia.edu/oracle
Bacon Number
No of People
0
1
1
1479
2
115203
3
285896
4
65055
5
4535
6
534
7
81
8
28
9
1
10
1
Total linkable actors
472814
29th November 2001
Higher-order functions
More generally:
linkedTo s = complete (stars,inv_stars) s
where:
complete (f,inv_f) result = let new = [x | b  result;
a  inv_f b;
x  f a] in
if (subset new result)
linkedTo
then result
f
else complete (f,inv_f) (new U result)
A
B
29th November 2001
Collaboration Networks
Find all people linked to a person P via the author relationship:
complete (author,inv_author) [P]
Paper
author
Person
inv_author
29th November 2001
Acknowledgements…
If we ask the simpler query
[x | pinv_author ’Alexandra Poulovassilis’; xauthor p]
we obtain the people with whom I have coauthored research papers:
Paper
author
Person
inv_author
J.Bailey K.Benkerimi S.Courtenage P.Demetriades
M.Derakhshan B.Heydecker S.Hild P.J.H.King
M.Levene N.Lorentzos P.J.McBrien P.Newson
E.Nonas R.Offen S.Reddi S.Schwarz C.Small E.Tuv
P.T.Wood L.Xu
29th November 2001
Drawbacks of the Binary Relational Model
Despite its elegance, the binary relational model has some drawbacks:
(a) large binary relational schemas can be hard to understand
(b) it is not so natural for representing higher-dimensional relationships
29th November 2001
The Hypernode Model
(a) led to research into nested-graph data models with Mark Levene
29th November 2001
Higher-dimensional relationships
An example of problem (b) is the 3-way relationship between
Distribution companies, Programmes and TV companies
which has to be represented by an entity and 3 binary relationships:
Supply
DistrCo
TVCo
Programme
29th November 2001
The PFL Language
 This led to the development of a new functional language PFL, with
Carol Small, which directly supports higher-dimensional relationships

e.g. the supply relationship is accessed by a single selector function
|supply : (DistrCo,Programme,TVCo) 
[(DistrCo,Programme,TVCo)]

Some examples:
|supply (Any,P205,BBC)
|supply (Any,Any,BBC)
|supply (Any,P205,Any)
29th November 2001
Active Databases
 Up to now, I have been looking at schema, data and derived database
information
 In the 1990s a new kind of database information was being explored,
namely event-condition-action rules of the form:
on event if condition do action
 ECA rules make a database active in that it can automatically execute
actions if events occur and conditions hold
29th November 2001
Active PFL
 In a project during mid 1990s, we extended PFL with ECA rules (with
Swarup Reddi and Carol Small)
 For example:
on
if
do
insert viewers
[s | (s,n)|viewersInc (Any,Any); n < 500000]
insert [s | (s,n)|viewersInc (Any,Any); n < 500000] lowRated
viewers
Showing
Number
29th November 2001
PFL’s ECA rule execution semantics
We specified these in PFL itself, to experiment before implementing:
execSched (db,s) =
if s = []
then (db,[])
else execSched (schedRules (exec (head s,db),s))
schedRules (db,a:s) =
let (db,pre,suf) =
fold schedRule (db,[],[]) (triggers a) in
(db,pre ++ s ++ suf)
schedRule i (db,pre,suf) =
if (eval (event-condition-query i) db) = {}
then (db,pre,suf)
else updateSched (actions i,mode i,db,pre,suf)
29th November 2001
Analysing and Optimising ECA rules
 Techniques are needed for analysing and optimising the behaviour of
ECA rules
 In a project that started in late 1990s, we have been using the
functional semantics of ECA rule execution as the basis for developing
such techniques (with James Bailey, Simon Courtenage, Pete Newson)
 In particular, we have been investigating abstract interpretation and
partial evaluation of the rule execution semantics for analysis and
optimisation, respectively.
29th November 2001
Abstract execution semantics
execSched* (db*,s*) =
if s* = []
then (db*,[])
else execSched* (schedRules* (exec* (head s*,db*),s*))
schedRules* (db*,a*:s*) =
let (db*,pre*,suf*) =
fold schedRule* (db*,[],[]) (triggers a*) in
(db*,pre* ++ s* ++ suf*)
schedRule* i (db*,pre*,suf*) =
if (eval* (event-condition-query i) db*) = False
then (db*,pre*,suf*)
else updateSched (actions i,mode i,db*,pre*,suf*)
29th November 2001
Correctness of the Abstract Execution

If for all queries q, abstract databases db*, and abstract actions a*:
• conc (exec* (a*,db*)) is a superset of
[exec (a,db) | (a,db)  conc (a*,db*)]
• eval* q db* = False implies that
for all db in conc db*, eval q db = {}

then execSched* is a conservative test for
•
•
rule termination
rule unreachability
29th November 2001
Partial Evaluation of Rule Execution
Produce a specialised equation for schedRules for each kind of rule action
that may appear at the head of the schedule:
schedRules (db,a1:s) =
let (db,pre,suf) =
fold schedRule (db,[],[]) (triggers a1) in
(db,pre ++ s ++ suf)
schedRules (db,a2:s) =
let (db,pre,suf) =
fold schedRule (db,[],[]) (triggers a2) in
(db,pre ++ s ++ suf) . . .
29th November 2001
Partial Evaluation of Rule Execution
Suppose action a1 triggers rule 2 and rule 3 (in that order of priority).
Then we can replace triggers a1 above by [2,3] and apply fold
obtaining:
schedRules (db,a1:s) =
let (db,pre,suf) =
schedRule (schedRule (db,[],[]) 2) 3 in
(db,pre ++ s ++ suf)
29th November 2001
Partial Evaluation of Rule Execution
Now we can apply schedRule (assuming rule 2 has Immediate
scheduling mode and rule 3 Deferred scheduling mode):
schedRules (db,a1:s) =
let (db,pre,suf) =
if (eval (event-condition-query 2) db) = {}
then if (eval (event-condition-query 3) db) = {}
then (db,[],[])
else (db,[],bind (actions 3) db)
else if (eval (event-condition-query 3) db) = {}
then (db,bind (actions 2) db,[])
else (db,bind (actions 2) db,bind (actions 3) db)
in (db,pre ++ s ++ suf)
29th November 2001
Heterogeneous Databases
 So far, I have been discussing single databases
 However, larger-scale applications may need to integrate information
from several databases, possibly supporting different data models
 To integrate information stored in such heterogeneous databases it is
necessary to form a single, integrated schema
 Conflicts may existing between the various source schemas, which
must be removed by applying transformations to these schemas
29th November 2001
29th November 2001
Graphs for Schema Transformation
 In work with Peter McBrien started in late 1990s, we have developed a
general framework for transforming and integrating heterogeneous
database schemas
 We represent schemas expressed in higher-level data models, such as
relational or object-oriented, in terms of a nested-graph data model,
thus allowing us to transform between different data models
 In our schema transformation framework, new schema constructs are
defined using queries over existing constructs
 In our framework, schema transformations are reversible, thus
allowing query and data translation between schemas:
29th November 2001
29th November 2001
29th November 2001
addClass Series [p|(p,S)category]
addClass Doc [p|(p,D)category]
addClass Film [p|(p,F)category]
addClass Prog [p|(p,c)category]
29th November 2001
addSubClass Film Prog
addSubClass Doc Prog
addSubClass Series Prog
addClass Series [p|(p,S)category]
addClass Doc [p|(p,D)category]
addClass Film [p|(p,F)category]
addClass Prog [p|(p,c)category]
29th November 2001
addSubClass Film Prog
addSubClass Doc Prog
addSubClass Series Prog
addClass Series [p|(p,S)category]
addClass Doc [p|(p,D)category]
addClass Film [p|(p,F)category]
addClass Prog [p|(p,c)category]
delRel category [(p,F)|pFilm] U
[(p,D)|pDoc] U
[(p,S)|pSeries]
29th November 2001
addConstraint subset Film Prog
addConstraint subset Doc Prog
addConstraint subset Series Prog
addNode Series [p|(p,S)category]
addNode Doc [p|(p,D)category]
addNode Film [p|(p,F)category]
addNode Prog [p|(p,c)category]
delEdge category [(p,F)|pFilm] U
[(p,D)|pDoc] U
[(p,S)|pSeries]
delNode Programme Prog
delNode Category [F,D,S]
29th November 2001
delConstraint subset Film Prog
delConstraint subset Doc Prog
delConstraint subset Series Prog
delNode Series [p|(p,S)category]
delNode Doc [p|(p,D)category]
delNode Film [p|(p,F)category]
delNode Prog [p|(p,c)category]
addEdge category [(p,F)|pFilm] U
[(p,D)|pDoc] U
[(p,S)|pSeries]
addNode Programme Prog
addNode Category [F,D,S]
29th November 2001
Query Translation
 Given a transformation from a schema S1 to a schema S2, and a query
Q on S1, we use the delete transformation steps to substitute for
constructs of S1 which are not in S2 e.g. from the previous slide:
 [title p | p  Film U Doc]
on:
translates into
 [title p | p  [p | (p,F)  category] U
[p | (p,D)  category]
29th November 2001
on:
Functions for Database Integration
 In the formal specification of our framework, each schema
transformation is a function
t : Database  Database
where a database consists of schema+data
 We are currently implementing our framework within the Automed
project
 We are planning to handle query language heterogeneity in Automed
by translation into/from a functional intermediate query language
29th November 2001
Future Research
 Extending Automed to also handle materialised views and view
updates, leading to a data warehousing approach to data integration
 Data warehousing of genomic data (in collaboration with Profs
Thornton, Orengo, Barton, and Drs Keller, Martin, Shepherd)
 Moving beyond database integration and database dynamics to data
integration on the Web and Web dynamics:
• handling XML data sources within Automed
• developing an ECA rule language for XML
29th November 2001