* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Critique of Relational Database Models
Entity–attribute–value model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Concurrency control wikipedia , lookup
Relational algebra wikipedia , lookup
Clusterpoint wikipedia , lookup
Functional Database Model wikipedia , lookup
Healthcare Cost and Utilization Project wikipedia , lookup
Critique of Relational Database Models Why relational? Relational, network and CODASYL DBs Advantages of RDBs classified 5/22/2017 1 CS319 Theory of Databases Orientation / schedule for module 2005 Wk 1-2 Wk 2-6 Wk 7 Wk 8 Wk 9 Wk 10 Generalities on databases Relational database theory Evaluating relational databases SQL and object-relational DBs Temporal Relational Databases Reflection on DBs 3 13 3 4 4 3 Hugh Darwen in weeks 8 and 9 Week 8 - Monday 2pm + 5pm, Thursday 2pm + 5pm Week 9 - Monday 2pm + 5pm, Thursday 2pm + 5pm 5/22/2017 3 CS319 Theory of Databases Why relational? C.J. Date Relational Database Writings 1985-1989 Purpose of the paper ... ... a succint and reasonably comprehensive summary of the main advantages of the relational approach … concerned with technical not business advantages … to evaluate relational models in DBs fully we must also consider the most fundamental issues 5/22/2017 4 CS319 Theory of Databases The agenda for reading Why Relational? Where is Date coming from? what is his bias? How do we classify Date's perceived virtues of relational models? Some virtues differ in nature from others ... To what extent are the qualities of relational databases fundamentally to do with relations? What is the future for databases as a concept? 5/22/2017 5 CS319 Theory of Databases Orientation on the issues raised by Date Paper has a rationale behind it - to defend relational models from emerging new technologies (c. 1989) Date has a long history as a relational DB champion Even the initial claim of the paper is contested (by 1989) First and primary advantage of RDB model: simplicity Issue: is SQL and ORACLE simple ... ? … but with what is it being compared? 5/22/2017 6 CS319 Theory of Databases Context: candidate abstract data models 3 classical models: hierarchical e.g. Information Management System (IMS) developed late 1960s for Apollo mission network Conference on Data Systems Languages CODASYL : standardised COBOL CODASYL : Database Task Group (DBTG) Official CODASYL reports 1971-1978 5/22/2017 7 CS319 Theory of Databases Context: candidate abstract data models (cont.) 3 classical models: hierarchical, network, ... relational proposed by E.F. Codd in 1970 E.F. Codd was at IBM San Jose RL Examples: System R [Sequel -> SQL], Ingres [Quel], QBE, PRTV [ISBL] Commercial Relational Systems in 1980s 5/22/2017 8 CS319 Theory of Databases Context: Other Candidate Models Clear that relational database are good for many commercial enterprises involved in data processing What about other applications? need different models? • interactive design human interaction & intervention essential in design • real-time applications need fast response, no encoding overheads • integrated project support environments need to store pieces of code, diagrams etc. 5/22/2017 9 CS319 Theory of Databases Context: Other Candidate Models Possible alternative approaches Extensions to relational e.g. deductive dbs Datalog (proper subset of Prolog) logic language cf. Kowalski Logic for Problem Solving object-oriented databases application of OOP to DBs dates from late 1980s: e.g. Orion, Kim, Cactis, Gemstone, O2, Iris 5/22/2017 10 CS319 Theory of Databases Putting Date's view in context .... • is Date biased? list of advantages could go on for ever, or at least for a very long time (p3) anywhere from 5-fold to 20-fold increases in productivity (p5) cf. quotes from other sources ... tables are sufficient, in the sense that there is no known data that cannot be represented in tabular form (p5) (what about ”the Mona Lisa", or "the sound of the last act of Marriage of Figaro”?) 5/22/2017 11 CS319 Theory of Databases Useful to put Date's view in historical context Brief history establishes the historical context …. CODASYL databases on the network model Outline of a network model for the HVFC MEMBERS (NAME, ADDRESS, BALANCE) ORDERS (ORDER_NO, NAME, ITEM, QUANTITY) SUPPLIERS (SNAME, SADDR, ITEM, PRICE) Develop an entity-relationship diagram ... 5/22/2017 12 CS319 Theory of Databases CODASYL databases on the network model 1 Network model for the HVFC: MEMBERS (NAME, ADDRESS, BALANCE) ORDERS (ORDER_NO, NAME, ITEM, QUANTITY) SUPPLIERS (SNAME, SADDR, ITEM, PRICE) Develop an entity-relationship diagram … have two many-many relationships SUPPLIES (SUPPLIERS, ITEMS) ORDERS (MEMBERS, ITEMS) Principle of querying in a CODASYL model • replace many-many relationships by functions • navigate around sets of records via functions 5/22/2017 13 CS319 Theory of Databases CODASYL databases on the network model 2 A many-many relationship XY can be expressed as a-1b where a: RX & b: RY are many-one functions Example: to factorise many-many relationship in HVFC ORDERS (MEMBERS, ITEMS) Introduce a set of records to represent ORDERS Typical record is (m_name, i_name, quantity) 5/22/2017 14 CS319 Theory of Databases CODASYL databases on the network model 3 A many-many relationship XY can be expressed as a-1b where a: RX & b: RY are many-one functions Factorise ORDERS into two projection maps: MEMBORD : ORDERS MEMBERS ITEMORD : ORDERS ITEMS where MEMBORD (m_name, i_name, quantity) = m_name ITEMORD (m_name, i_name, quantity) = i_name Represent many-many ORDERS relationship by MEMBORD-1 . ITEMORD by combining the two projections thus: MEMBERS ORDERS ITEMS 5/22/2017 15 CS319 Theory of Databases A Sample CODASYL query "Find how much Granola Brooks has ordered" NAME := "Brooks" FIND MEMBERS RECORD USING CALC-KEY LOOP: repeat forever FIND NEXT ORDERS RECORD IN CURRENT MEMBORD SET if FAIL then break LOOP FIND OWNER OF CURRENT ITEMORD SET GET ITEMS; INAME if ITEMS.INAME = "Granola" then do FIND CURRENT OF ORDERS RECORD GET ORDERS; QUANTITY print QUANTITY break LOOP end end LOOP 5/22/2017 16 CS319 Theory of Databases Commentary on the CODASYL query 1 NAME := "Brooks" > find the MEMBERS record associated with Brooks > assume stored by CALC_key (hash-code) NAME FIND MEMBERS RECORD USING CALC-KEY LOOP: repeat forever FIND NEXT ORDERS RECORD IN CURRENT MEMBORD SET > traverse link MEMBORD: ORDERS MEMBERS > current MEMBERS record is Brooks’s > link to his orders if FAIL then break LOOP FIND OWNER OF CURRENT ITEMORD SET > apply link ITEMORD: ORDERS ITEMS > to determine what item was ordered 5/22/2017 17 CS319 Theory of Databases Commentary on the CODASYL query (cont.) ... > apply link ITEMORD: ORDERS ITEMS > to determine what item was ordered GET ITEMS; INAME > access name of the item ordered if ITEMS.INAME = "Granola" then do > check to see if item ordered is Granola FIND CURRENT OF ORDERS RECORD > current orders record is order by Brooks of Granola GET ORDERS; QUANTITY > access quantity of Granola ordered by Brooks print QUANTITY break LOOP end end LOOP 5/22/2017 18 CS319 Theory of Databases About the CODASYL environment Issue: is SQL and ORACLE simple ... ? “ The sheer range of FIND commands and their almost Byzantine intricacy is one of the reasons why DBTG databases are programmed by experts … ” “ The efficiency of CODASYL implementations for performing access and update has been a very large factor in their widespread use. This efficiency has been purchased at the cost of using a baffling variety of storage strategies and DML commands … ” Peter Gray: Logic, Algebra and Databases 5/22/2017 19 CS319 Theory of Databases SQL is simple - relative to CODASYL ORDERS (ORDER_NO, NAME, ITEM, QUANTITY) "Find how much Granola Brooks has ordered” select QUANTITY from ORDERS where NAME=‘Brooks’ and ITEM=‘Granola’ The SQL-CODASYL comparison highlights reason for Date-Darwen concern about ‘back-to-the-future’ in DBs 5/22/2017 20 CS319 Theory of Databases Why relational? 1 CODASYL is bad, but is relational good? [ also beware! CODASYL is bad, but is network bad? ] ... first try to understand Date's claims by comparing the two models .... Areas of usefulness for relational model: data manipulation database design database definition database installation .... 5/22/2017 21 CS319 Theory of Databases Why relational? 2 Advantages of relational technology: usability productivity ... promotes end-user programming Evident in relation to the CODASYL alternative! cf Korth and Silberchatz file system vs DBMS 5/22/2017 22 CS319 Theory of Databases Why relational? 3 Perceived advantages of relational DBs: • simple data structure • simple operators • no frivolous distinctions • SQL support • the view mechanism • sound theoretical base • small number of concepts • the dual-mode principle • physical data independence • logical data independence 5/22/2017 23 CS319 Theory of Databases Why relational? 4 Perceived advantages of relational DBs (cont.): • • • • • • • • ease of application development dynamic data definition ease of installation and ease of operations simplified database design integrated dictionary distributed database support performance extendability … all evident in relation to CODASYL comparison 5/22/2017 24 CS319 Theory of Databases A brief elaboration of Date's concerns 1 • simple data structure table is the basis of the relational model • simple operators 5 relational operators for completeness set-level operations / closure / declarative • no frivolous distinctions uniform methods of interaction with DB e.g. for update relation, or impose constraint • SQL support high-level queries / widespread use, acceptance • the view mechanism means to customise the DB without new concepts • sound theoretical base relational model is mathematically rigorous 5/22/2017 25 CS319 Theory of Databases A brief elaboration of Date's concerns 2 • small number of concepts single mode of representation + uniform update cf multi-mode + proliferation of mechanisms • the dual-mode principle embedded DML to access the DB from programs autonomous activity resembles user interaction • physical data independence separate conceptual model / physical database • logical data independence separate conceptual model / user views • ease of application development makes application generators possible makes high-level prototyping easy • dynamic data definition can modify a relational DB design incrementally 5/22/2017 26 CS319 Theory of Databases A brief elaboration of Date's concerns 3 • ease of installation and ease of operations robust, easy to manage by few personnel • simplified database design have principles for database design • integrated dictionary consistent interface for meta-level access metadata-driven programs can be written • distributed database support high semantic content of queries, declarative nature cf. problems of breaking up procedural chains • performance down to optimiser, not applications programmer • extendability can easily build on relational database models 5/22/2017 27 CS319 Theory of Databases Date's concerns and CODASYL 1 • simple data structure? cf complexity of DBTG sets and pointers • simple operators? no high-level operators, nothing at the set-level have to record state, pointers create modes • no frivolous distinctions? complex methods of interaction with DB e.g. update relation and impose constraint would be dealt with in entirely separate ways • SQL support? no concept of high-level query, was widespread! • the view mechanism? has no analogue for CODASYL • sound theoretical base? no discernible theory in CODASYL framework 5/22/2017 28 CS319 Theory of Databases Date's concerns and CODASYL 2 • small number of concepts? multi-mode + proliferation of mechanisms representation ways to select, insert, delete, update • the dual-mode principle? no clear distinction between high-level queries and application programmer's mode of access • physical data independence? conceptual model mixed with physical database • logical data independence? no provision for user views • ease of application development? CODASYL doesn't make data access much easier • dynamic data definition? DB design has to be carefully preconceived and can't easily be adapted 5/22/2017 29 CS319 Theory of Databases Date's concerns and CODASYL 3 • ease of installation and ease of operations? CODASYL probably keeps program surgery busy • simplified database design? principles for database design more suspect • integrated dictionary? meta-level issues not addressed within model • distributed database support? who'd like to parallelise CODASYL updates? • performance? was traditionally better than relational models! • extendability? CODASYL not something to be built on ... 5/22/2017 30 CS319 Theory of Databases Interpreting Date’s defence of relational models Date’ s arguments in defence of relational models are very powerful when seen in the context of CODASYL Need to understand them in relation what might be the best data modelling practices for today and the future Important for this purpose to classify the defences: • defence from theory ? is the theory adequate • defence from practice ? will the practice change • special qualities exhibited by the relational model ? are they particular to RDBs, or generalisable 5/22/2017 31 CS319 Theory of Databases Classifying the advantages cited by Date 1 Will classify Date’s list of advantages into THEORY, PRINCIPLES and CONSEQUENCES and further subdivide PRINCIPLES into PRACTICAL & FOUNDATIONAL THEORY • simple data structure • simple operators • no frivolous distinctions • sound theoretical base • small number of concepts 5/22/2017 32 CS319 Theory of Databases Classifying the advantages cited by Date 2 PRINCIPLES - PRACTICAL ASPECT • SQL support • the view mechanism • the dual-mode principle • physical data independence • logical data independence • dynamic data definition PRINCIPLES - FOUNDATIONAL ASPECT • simplified database design • integrated dictionary: metadata-driven • distributed database support: atomicity 5/22/2017 33 CS319 Theory of Databases Classifying the advantages cited by Date 3 CONSEQUENCES • ease of application development • ease of installation and ease of operations • performance • extendability The status of these advantages is relevant when we come to consider what is really siginificant about the relational model in comparison with other alternatives ... 5/22/2017 34 CS319 Theory of Databases … will return to express personal views concerning the defence of the relational position later … turn next to the issue of ‘Why not Relational?’ whynotrel.ppt 5/22/2017 35 CS319 Theory of Databases What are the virtues of the relational model? 1 Certain features of relational models wish to retain ... The defence from theory … • simple data structure want elegant and consistent structures • simple operators want high-level operators need techniques at the set-level don't want to have to record state don't want to maintain pointers 5/22/2017 36 CS319 Theory of Databases What are the virtues of the relational model? 2 Certain features of relational models wish to retain ... The defence from theory … • small number of concepts want a unified view for representation uniform ways to manipulate • sound theoretical base want to be able to apply mathematical techniques ... but all these attributes apply to Miranda, for example, and this hasn't made it widely / wildly successful 5/22/2017 37 CS319 Theory of Databases What are the virtues of the relational model? 3 The defence from practice ... • SQL support need concept of high-level query • the view mechanism must be able to represent different user views • the dual-mode principle invoking user commands automatically is a powerful principle for program development and debugging • physical and logical data independence must be possible to separate concerns at high and low levels of abstraction ... but do these qualities fit into a general scheme or are they specific to the relational framework? 5/22/2017 38 CS319 Theory of Databases What are the virtues of the relational model? 4 Evidence of special suitability for real-world modelling ... • simplified database design have principles for database design contrast the messiness of CODASYL • integrated dictionary can write metadata-driven programs no chance to take high-level view in CODASYL • distributed database support high semantic content of queries, atomicity of action queries in CODASYL not much about the real-world 5/22/2017 39 CS319 Theory of Databases What are the virtues of the relational model? 5 Evidence of special suitability for real-world modelling ... ... database design reveals very direct connections between dependencies amongst attributes of realworld objects and forms for their representation in relation schemes content = real-world meaning dictating form = structure of the representation Fundamental conflict between theory and practice over the relationship between form and content 5/22/2017 40 CS319 Theory of Databases What are the virtues of the relational model? 6 Important aspects of relational DBs (in WMB’s view) THEORY aspect underlying algebraic model • provides basis for unambiguous evaluation • closure properties • potential for optimisation & axiomatisation PRINCIPLES represented in views + application generators + spreadsheets 5/22/2017 41 CS319 Theory of Databases What are the virtues of the relational model? 7 Important aspects of relational DBs (in WMB’s view) PRACTICAL aspect • involve state essentially, so not purely declarative • good for expressing agent actions / views • good for representing levels of abstraction cf ACE & A Small Matter of Programming, Bonnie Nardi Represents a framework for managing state cleaner than procedural programming, more expressive than FP 5/22/2017 42 CS319 Theory of Databases What are the virtues of the relational model? 8 Important aspects of relational DBs (in WMB’s view) FOUNDATIONAL aspect • concerned with metaphor not symbolic representation • invokes form and content in combination Notes on these respective issues • metaphor: the form reflects the content [as is true to some degree of relational models] • cf logicism debate in AI: A Critique of Pure Reason McDermott et seq 5/22/2017 43 CS319 Theory of Databases Issues for database development 1 How to avoid "back to the future"? • need theoretical foundation • need qualities of declarative query • need principles to handle abstraction at many levels: data independence • need to support interaction of agents at high-levels of abstraction • need to retain / replace the form-content relationships that relational DB design theory introduces 5/22/2017 44 CS319 Theory of Databases Issues for database development 2 Modern database demands • enormous volumes of data • high-performance e.g. for multi-media, real-time • support for metaphor e.g. visual image not table • concurrent access, distributed data • closer integration between direct (human) and programmed (computer) data access • support for modern data abstractions: objects, inheritance, aggregation • applicability to design environment needs: incremental intensional change 5/22/2017 45 CS319 Theory of Databases