Download Critique of Relational Database Models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Relational algebra wikipedia , lookup

Clusterpoint wikipedia , lookup

Functional Database Model wikipedia , lookup

Database wikipedia , lookup

Healthcare Cost and Utilization Project wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Critique of Relational Database Models
Why relational?
Relational, network and CODASYL DBs
Advantages of RDBs classified
5/22/2017
1
CS319 Theory of Databases
Orientation / schedule for module 2005
Wk 1-2
Wk 2-6
Wk 7
Wk 8
Wk 9
Wk 10
Generalities on databases
Relational database theory
Evaluating relational databases
SQL and object-relational DBs
Temporal Relational Databases
Reflection on DBs
3
13
3
4
4
3
Hugh Darwen in weeks 8 and 9
Week 8 - Monday 2pm + 5pm, Thursday 2pm + 5pm
Week 9 - Monday 2pm + 5pm, Thursday 2pm + 5pm
5/22/2017
3
CS319 Theory of Databases
Why relational?
C.J. Date
Relational Database Writings 1985-1989
Purpose of the paper ...
... a succint and reasonably comprehensive summary of
the main advantages of the relational approach
… concerned with technical not business advantages
… to evaluate relational models in DBs fully we must
also consider the most fundamental issues
5/22/2017
4
CS319 Theory of Databases
The agenda for reading Why Relational?
Where is Date coming from? what is his bias?
How do we classify Date's perceived virtues of relational
models? Some virtues differ in nature from others ...
To what extent are the qualities of relational databases
fundamentally to do with relations?
What is the future for databases as a concept?
5/22/2017
5
CS319 Theory of Databases
Orientation on the issues raised by Date
Paper has a rationale behind it - to defend relational
models from emerging new technologies (c. 1989)
Date has a long history as a relational DB champion
Even the initial claim of the paper is contested (by 1989)
First and primary advantage of RDB model: simplicity
Issue: is SQL and ORACLE simple ... ?
… but with what is it being compared?
5/22/2017
6
CS319 Theory of Databases
Context: candidate abstract data models
3 classical models:
hierarchical
e.g. Information Management System (IMS)
developed late 1960s for Apollo mission
network
Conference on Data Systems Languages
CODASYL : standardised COBOL
CODASYL : Database Task Group (DBTG)
Official CODASYL reports 1971-1978
5/22/2017
7
CS319 Theory of Databases
Context: candidate abstract data models (cont.)
3 classical models:
hierarchical, network, ...
relational
proposed by E.F. Codd in 1970
E.F. Codd was at IBM San Jose RL
Examples:
System R [Sequel -> SQL],
Ingres [Quel], QBE, PRTV [ISBL]
Commercial Relational Systems in 1980s
5/22/2017
8
CS319 Theory of Databases
Context: Other Candidate Models
Clear that relational database are good for many
commercial enterprises involved in data processing
What about other applications? need different models?
• interactive design
human interaction & intervention essential in design
• real-time applications
need fast response, no encoding overheads
• integrated project support environments
need to store pieces of code, diagrams etc.
5/22/2017
9
CS319 Theory of Databases
Context: Other Candidate Models
Possible alternative approaches
Extensions to relational e.g. deductive dbs
Datalog (proper subset of Prolog)
logic language
cf. Kowalski Logic for Problem Solving
object-oriented databases
application of OOP to DBs dates from late 1980s: e.g.
Orion, Kim, Cactis, Gemstone, O2, Iris
5/22/2017
10
CS319 Theory of Databases
Putting Date's view in context ....
• is Date biased?
list of advantages could go on for ever, or at least for
a very long time (p3)
anywhere from 5-fold to 20-fold increases in
productivity (p5) cf. quotes from other sources ...
tables are sufficient, in the sense that there is no
known data that cannot be represented in tabular
form (p5) (what about ”the Mona Lisa", or "the sound
of the last act of Marriage of Figaro”?)
5/22/2017
11
CS319 Theory of Databases
Useful to put Date's view in historical context
Brief history establishes the historical context ….
CODASYL databases on the network model
Outline of a network model for the HVFC
MEMBERS (NAME, ADDRESS, BALANCE)
ORDERS (ORDER_NO, NAME, ITEM, QUANTITY)
SUPPLIERS (SNAME, SADDR, ITEM, PRICE)
Develop an entity-relationship diagram ...
5/22/2017
12
CS319 Theory of Databases
CODASYL databases on the network model 1
Network model for the HVFC:
MEMBERS (NAME, ADDRESS, BALANCE)
ORDERS (ORDER_NO, NAME, ITEM, QUANTITY)
SUPPLIERS (SNAME, SADDR, ITEM, PRICE)
Develop an entity-relationship diagram … have two
many-many relationships
SUPPLIES (SUPPLIERS, ITEMS)
ORDERS (MEMBERS, ITEMS)
Principle of querying in a CODASYL model
• replace many-many relationships by functions
• navigate around sets of records via functions
5/22/2017
13
CS319 Theory of Databases
CODASYL databases on the network model 2
A many-many relationship XY can be expressed as
a-1b where a: RX & b: RY are many-one functions
Example: to factorise many-many relationship in HVFC
ORDERS (MEMBERS, ITEMS)
Introduce a set of records to represent ORDERS
Typical record is (m_name, i_name, quantity)
5/22/2017
14
CS319 Theory of Databases
CODASYL databases on the network model 3
A many-many relationship XY can be expressed as
a-1b where a: RX & b: RY are many-one functions
Factorise ORDERS into two projection maps:
MEMBORD : ORDERS  MEMBERS
ITEMORD : ORDERS  ITEMS
where
MEMBORD (m_name, i_name, quantity) = m_name
ITEMORD (m_name, i_name, quantity) = i_name
Represent many-many ORDERS relationship by
MEMBORD-1 . ITEMORD
by combining the two projections thus:
MEMBERS  ORDERS  ITEMS
5/22/2017
15
CS319 Theory of Databases
A Sample CODASYL query
"Find how much Granola Brooks has ordered"
NAME := "Brooks"
FIND MEMBERS RECORD USING CALC-KEY
LOOP: repeat forever
FIND NEXT ORDERS RECORD IN CURRENT MEMBORD SET
if FAIL then break LOOP
FIND OWNER OF CURRENT ITEMORD SET
GET ITEMS; INAME
if ITEMS.INAME = "Granola" then do
FIND CURRENT OF ORDERS RECORD
GET ORDERS; QUANTITY
print QUANTITY
break LOOP
end
end LOOP
5/22/2017
16
CS319 Theory of Databases
Commentary on the CODASYL query 1
NAME := "Brooks"
> find the MEMBERS record associated with Brooks
> assume stored by CALC_key (hash-code) NAME
FIND MEMBERS RECORD USING CALC-KEY
LOOP: repeat forever
FIND NEXT ORDERS RECORD
IN CURRENT MEMBORD SET
> traverse link  MEMBORD: ORDERS  MEMBERS
> current MEMBERS record is Brooks’s
>  link to his orders
if FAIL then break LOOP
FIND OWNER OF CURRENT ITEMORD SET
> apply link  ITEMORD: ORDERS  ITEMS
> to determine what item was ordered
5/22/2017
17
CS319 Theory of Databases
Commentary on the CODASYL query (cont.)
...
> apply link  ITEMORD: ORDERS  ITEMS
> to determine what item was ordered
GET ITEMS; INAME
> access name of the item ordered
if ITEMS.INAME = "Granola" then do
> check to see if item ordered is Granola
FIND CURRENT OF ORDERS RECORD
> current orders record is order by Brooks of Granola
GET ORDERS; QUANTITY
> access quantity of Granola ordered by Brooks
print QUANTITY
break LOOP
end
end LOOP
5/22/2017
18
CS319 Theory of Databases
About the CODASYL environment
Issue: is SQL and ORACLE simple ... ?
“ The sheer range of FIND commands and their almost
Byzantine intricacy is one of the reasons why DBTG
databases are programmed by experts … ”
“ The efficiency of CODASYL implementations for
performing access and update has been a very large factor
in their widespread use. This efficiency has been
purchased at the cost of using a baffling variety of storage
strategies and DML commands … ”
Peter Gray: Logic, Algebra and Databases
5/22/2017
19
CS319 Theory of Databases
SQL is simple - relative to CODASYL
ORDERS (ORDER_NO, NAME, ITEM, QUANTITY)
"Find how much Granola Brooks has ordered”
select QUANTITY
from ORDERS
where NAME=‘Brooks’ and ITEM=‘Granola’
The SQL-CODASYL comparison highlights reason for
Date-Darwen concern about ‘back-to-the-future’ in DBs
5/22/2017
20
CS319 Theory of Databases
Why relational? 1
CODASYL is bad, but is relational good?
[ also beware! CODASYL is bad, but is network bad? ]
... first try to understand Date's claims by comparing the
two models ....
Areas of usefulness for relational model:
data manipulation
database design
database definition
database installation
....
5/22/2017
21
CS319 Theory of Databases
Why relational? 2
Advantages of relational technology:
usability
productivity
... promotes end-user programming
Evident in relation to the CODASYL alternative!
cf Korth and Silberchatz file system vs DBMS
5/22/2017
22
CS319 Theory of Databases
Why relational? 3
Perceived advantages of relational DBs:
• simple data structure
• simple operators
• no frivolous distinctions
• SQL support
• the view mechanism
• sound theoretical base
• small number of concepts
• the dual-mode principle
• physical data independence
• logical data independence
5/22/2017
23
CS319 Theory of Databases
Why relational? 4
Perceived advantages of relational DBs (cont.):
•
•
•
•
•
•
•
•
ease of application development
dynamic data definition
ease of installation and ease of operations
simplified database design
integrated dictionary
distributed database support
performance
extendability
… all evident in relation to CODASYL comparison
5/22/2017
24
CS319 Theory of Databases
A brief elaboration of Date's concerns 1
• simple data structure
table is the basis of the relational model
• simple operators
5 relational operators for completeness
set-level operations / closure / declarative
• no frivolous distinctions
uniform methods of interaction with DB
e.g. for update relation, or impose constraint
• SQL support
high-level queries / widespread use, acceptance
• the view mechanism
means to customise the DB without new concepts
• sound theoretical base
relational model is mathematically rigorous
5/22/2017
25
CS319 Theory of Databases
A brief elaboration of Date's concerns 2
• small number of concepts
single mode of representation + uniform update
cf multi-mode + proliferation of mechanisms
• the dual-mode principle
embedded DML to access the DB from programs
autonomous activity resembles user interaction
• physical data independence
separate conceptual model / physical database
• logical data independence
separate conceptual model / user views
• ease of application development
makes application generators possible
makes high-level prototyping easy
• dynamic data definition
can modify a relational DB design incrementally
5/22/2017
26
CS319 Theory of Databases
A brief elaboration of Date's concerns 3
• ease of installation and ease of operations
robust, easy to manage by few personnel
• simplified database design
have principles for database design
• integrated dictionary
consistent interface for meta-level access
metadata-driven programs can be written
• distributed database support
high semantic content of queries, declarative nature
cf. problems of breaking up procedural chains
• performance
down to optimiser, not applications programmer
• extendability
can easily build on relational database models
5/22/2017
27
CS319 Theory of Databases
Date's concerns and CODASYL 1
• simple data structure?
cf complexity of DBTG sets and pointers
• simple operators?
no high-level operators, nothing at the set-level
have to record state, pointers create modes
• no frivolous distinctions?
complex methods of interaction with DB
e.g. update relation and impose constraint would be
dealt with in entirely separate ways
• SQL support?
no concept of high-level query, was widespread!
• the view mechanism?
has no analogue for CODASYL
• sound theoretical base?
no discernible theory in CODASYL framework
5/22/2017
28
CS319 Theory of Databases
Date's concerns and CODASYL 2
• small number of concepts?
multi-mode + proliferation of mechanisms
representation ways to select, insert, delete, update
• the dual-mode principle?
no clear distinction between high-level queries and
application programmer's mode of access
• physical data independence?
conceptual model mixed with physical database
• logical data independence?
no provision for user views
• ease of application development?
CODASYL doesn't make data access much easier
• dynamic data definition?
DB design has to be carefully preconceived and can't
easily be adapted
5/22/2017
29
CS319 Theory of Databases
Date's concerns and CODASYL 3
• ease of installation and ease of operations?
CODASYL probably keeps program surgery busy
• simplified database design?
principles for database design more suspect
• integrated dictionary?
meta-level issues not addressed within model
• distributed database support?
who'd like to parallelise CODASYL updates?
• performance?
was traditionally better than relational models!
• extendability?
CODASYL not something to be built on ...
5/22/2017
30
CS319 Theory of Databases
Interpreting Date’s defence of relational models
Date’ s arguments in defence of relational models are
very powerful when seen in the context of CODASYL
Need to understand them in relation what might be the
best data modelling practices for today and the future
Important for this purpose to classify the defences:
• defence from theory
? is the theory adequate
• defence from practice ? will the practice change
• special qualities exhibited by the relational model
? are they particular to RDBs, or generalisable
5/22/2017
31
CS319 Theory of Databases
Classifying the advantages cited by Date 1
Will classify Date’s list of advantages into
THEORY, PRINCIPLES and CONSEQUENCES
and further subdivide
PRINCIPLES into PRACTICAL & FOUNDATIONAL
THEORY
• simple data structure
• simple operators
• no frivolous distinctions
• sound theoretical base
• small number of concepts
5/22/2017
32
CS319 Theory of Databases
Classifying the advantages cited by Date 2
PRINCIPLES - PRACTICAL ASPECT
• SQL support
• the view mechanism
• the dual-mode principle
• physical data independence
• logical data independence
• dynamic data definition
PRINCIPLES - FOUNDATIONAL ASPECT
• simplified database design
• integrated dictionary: metadata-driven
• distributed database support: atomicity
5/22/2017
33
CS319 Theory of Databases
Classifying the advantages cited by Date 3
CONSEQUENCES
• ease of application development
• ease of installation and ease of operations
• performance
• extendability
The status of these advantages is relevant when we
come to consider what is really siginificant about the
relational model in comparison with other alternatives ...
5/22/2017
34
CS319 Theory of Databases
… will return to express personal views concerning the
defence of the relational position later … turn next to
the issue of ‘Why not Relational?’
 whynotrel.ppt
5/22/2017
35
CS319 Theory of Databases
What are the virtues of the relational model? 1
Certain features of relational models wish to retain ...
The defence from theory …
• simple data structure
want elegant and consistent structures
• simple operators
want high-level operators
need techniques at the set-level
don't want to have to record state
don't want to maintain pointers
5/22/2017
36
CS319 Theory of Databases
What are the virtues of the relational model? 2
Certain features of relational models wish to retain ...
The defence from theory …
• small number of concepts
want a unified view for representation
uniform ways to manipulate
• sound theoretical base
want to be able to apply mathematical techniques
... but all these attributes apply to Miranda, for example,
and this hasn't made it widely / wildly successful
5/22/2017
37
CS319 Theory of Databases
What are the virtues of the relational model? 3
The defence from practice ...
• SQL support
need concept of high-level query
• the view mechanism
must be able to represent different user views
• the dual-mode principle
invoking user commands automatically is a powerful
principle for program development and debugging
• physical and logical data independence
must be possible to separate concerns
at high and low levels of abstraction
... but do these qualities fit into a general scheme or are
they specific to the relational framework?
5/22/2017
38
CS319 Theory of Databases
What are the virtues of the relational model? 4
Evidence of special suitability for real-world modelling ...
• simplified database design
have principles for database design
contrast the messiness of CODASYL
• integrated dictionary
can write metadata-driven programs
no chance to take high-level view in CODASYL
• distributed database support
high semantic content of queries, atomicity of action
queries in CODASYL not much about the real-world
5/22/2017
39
CS319 Theory of Databases
What are the virtues of the relational model? 5
Evidence of special suitability for real-world modelling ...
... database design reveals very direct connections
between dependencies amongst attributes of realworld objects and forms for their representation in
relation schemes
content = real-world meaning
dictating form = structure of the representation
Fundamental conflict between theory and practice
over the relationship between form and content
5/22/2017
40
CS319 Theory of Databases
What are the virtues of the relational model? 6
Important aspects of relational DBs (in WMB’s view)
THEORY aspect
underlying algebraic model
• provides basis for unambiguous evaluation
• closure properties
• potential for optimisation & axiomatisation
PRINCIPLES represented in
views + application generators + spreadsheets
5/22/2017
41
CS319 Theory of Databases
What are the virtues of the relational model? 7
Important aspects of relational DBs (in WMB’s view)
PRACTICAL aspect
• involve state essentially, so not purely declarative
• good for expressing agent actions / views
• good for representing levels of abstraction
cf ACE & A Small Matter of Programming, Bonnie Nardi
Represents a framework for managing state cleaner
than procedural programming, more expressive than FP
5/22/2017
42
CS319 Theory of Databases
What are the virtues of the relational model? 8
Important aspects of relational DBs (in WMB’s view)
FOUNDATIONAL aspect
• concerned with metaphor not symbolic representation
• invokes form and content in combination
Notes on these respective issues
• metaphor: the form reflects the content
[as is true to some degree of relational models]
• cf logicism debate in AI:
A Critique of Pure Reason McDermott et seq
5/22/2017
43
CS319 Theory of Databases
Issues for database development 1
How to avoid "back to the future"?
• need theoretical foundation
• need qualities of declarative query
• need principles to handle abstraction at many levels:
data independence
• need to support interaction of agents at high-levels of
abstraction
• need to retain / replace the form-content relationships
that relational DB design theory introduces
5/22/2017
44
CS319 Theory of Databases
Issues for database development 2
Modern database demands
• enormous volumes of data
• high-performance e.g. for multi-media, real-time
• support for metaphor e.g. visual image not table
• concurrent access, distributed data
• closer integration between direct (human) and
programmed (computer) data access
• support for modern data abstractions: objects,
inheritance, aggregation
• applicability to design environment needs:
incremental intensional change
5/22/2017
45
CS319 Theory of Databases