* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A Short History of Database Technology Traditional File
Survey
Document related concepts
Transcript
• Example: student data in a university – student address may be needed for registering, library management, financial office, grade reporting, ... – each application separately maintains its data files and programs to manipulate those files A Short History of Database Technology – possibly different formats for the same data (e.g., length of names) – redundant updates (e.g., to change an address) • Prehistory – file systems – hierarchical and network systems • The revolution: relational database technology • Post-relational era – new organizations of data, more complex data Limitations of File-Based Approach • Tendency to separate and isolate logically-related data – subsequent data models capture more information: true DB software “knows about” some inter-entity or inter-file relationships – influence of object technology – more complex applications • Separate files ⇒ redundancy in defining and storing data – wasted storage space (only a problem for large applications) – redundant efforts to enter replicated data and maintain its consistency (serious problem) 1 • Program-data dependence: data definition in application program ⇒ program valid for only one DB with a fixed structure • Labor intensive – maintenance difficult (duplication, procedurality) – inter-file links must be coded in programs 3 Traditional File-Based Approach • Collection of applications that each define and manage their own files • File technology was the first attempt to computerize manual filing systems • File: collection of records containing logically-related data (Pascal’s files of records, C++’s “struct” or “class” declarations, COBOL’s Data Division) • It is an obsolete technology, except for applications that do not require DBMS functions. Even for those problems, when efficiency is not crucial and size is not too big, it is often worth considering simple database software (like, e.g., ACCESS). • Tight integration between program and data (files) – physical storage structure visible in application code (e.g., ISAM, VSAM) – run-time performance can (and must) be programmed • Each enterprise function was responsible for its own data and accessed it with their own programs • File-based approach required much work from application programmers 2 History of Database Management- 1 History of Database Management- 2 Hierarchical and Network Systems • Software systems without underlying organized discipline • Lack of data independence Relational Model – traditional view of data structures as records (on disk storage) + links: mixture of physical and logical worlds – user view dependent on physical details irrelevant for information needs – modern view: links = semantic relationships + physical access paths • Navigational programming – all DB accesses imply procedural programming – programming = “navigation” governed by physical access paths =⇒ long and complex programs • Defined in 1970 (T. Codd) • About 10 years of hesitation • Robust RDBMSs built in the 80’s • Historical compromise between scientists, practitioners, vendors • Most systems sold in the early 90’s are relational – oriented towards one-record-at-a-time navigational programming ⇒ complex, necessarily-manual performance optimization 4 5 • Pre-relational (or pre-historical) data management, unsystematic, adhoc • Hierarchical systems (e.g., IMS, System 2000) • CODASYL (network) database systems (e.g., IDS, IDMS, TOTAL, ADABAS, PHOLAS) • Example of data dependence: physical location of fields within records visible in programs • About pointers: – pointers provide more explicit integration links (hierarchy, network) than relations – confusion between logical and physical pointers was the problem: the relational model established and demonstrated the virtues of data independence – somewhat cleaner pointers came back with object technology Novelty and Importance of the Relational Model • Simplicity of basic concepts • Simple theoretical framework • Emergence of notion of data model • Data independence: separate universe of application from that of implementation • Emergence of disciplined DB design • Validation of a declarative, high-level approach to data management • Construction of powerful DBMS software • New architectures • Standards 6 History of Database Management- 3 History of Database Management- 4 Relational Model and Systems Founded Modern DB Technology • Simplicity of basic concepts: simple, abstract data structure, independent of physical considerations, sufficiently powerful and general as a realistic framework for numerous database applications in business data processing • Simple basis for a theoretical framework that stimulated DB research – relational theory based on well-established theories: mathematical set theory and first-order predicate logic – the DB field turns into a scientific discipline, the pre-relational era had drawn little interest from computer scientists – many interesting questions were raised for researchers and the field attracted a lot of good minds; progress was quick in the 70’s – remark about theory in general: theoretical basis 6= lack of practicality ∗ theory = reliability, predictability ∗ a strong formal underlying theory is always interesting ∗ users need not master nor know about the whole theory • Emergence of the notion of data model, quickly leading to semantic data models with more semantics than the relational model, and a clean a posteriori definition of network and hierarchic data models. Data models include the definition of integrity at the application-domain level, as a part of database design. • Data independence: insulation of users from physical DB design and access optimization; all general trend in informatics Relational Market Today • Most users still buy relational, huge installed base • Leading relational vendors: Oracle, Sybase, Informix, IBM, Microsoft • Products of similar functionality and performance • Emphasis on client/server architecture, open systems, interoperability • System performance, robustness have consistently strengthened (e.g., recovery, concurrency support, distribution) • Numerous client tools (4GLs), users want forms, pictures, graphs, sounds, etc. not tables of numbers. • But the web is making 4GLs and business forms history ... • PC systems, competing with the larger DBMSs for lower-end applications 7 • Emergence of disciplined DB design: a more semantic or conceptual view of information structures (e.g., E-R) on top of the relational schema significantly simplifies analysis and design process • Design of high-level interactive database languages, based on set-at-a-time, nonprocedural operations, high-level operations (a first in the DB world): demonstrate that a less procedural style of data manipulation was not only practically feasible but also highly productive • Construction of efficient DBMS software, in particular efficient query optimizers, transaction management for multiple users (concurrency control), reliability and failure recovery Relational Market ... • New architectures: database machines, client-server, distributed databases, client tools, 4GLs • SQL has been overemphasized, overadvertised, has become inevitable (“COBOL of the 90’s”) • SQL Standard = portability of applications • Widespread lack of education on DB fundamentals • Some built-in economic incentives for complexity and confusion: market for books, journals, conferences, education, consulting, ... • SQL standardization has become an inhibitor to the development of better user interfaces 8 History of Database Management- 5 History of Database Management- 6 • Economics – reduction of hardware costs (memory, computing cycles, soon network bandwidth are essentially free at the margin) Post-Relational Era • New organizations of data – entity-relationship model – nonnormalized relations – object technology – object-relational systems – semi-structured and unstructured data (WWW) • New functions – distribution – heterogeneity (multidatabases, interoperability) – deduction – data analysis (data mining) • More complex applications: data-management technology enters application domains (e.g., design, geography, molecular biology, electronic commerce) – 25 years ago, one hour of machine time ≈ one week of programmer time – today, one week of programmer time ≈ a personal computer – relational technology has provided huge overall productivity gains • Evolution of DBMS software – development of efficient DBMS functions (e.g., relational query optimizers) – easier DB programming – powerful and user-friendly front-end software (traditional DB software is progressively becoming middleware) From the web: If the automobile had followed the same development cycle as the computer, a Rolls-Royce would today cost 100 USD, get one million miles to the gallon, and explode once a year, killing everybody inside. 9 Weaknesses of Relational Technology Evolution = Economics + Methodology • Economics – name of the game 30 years ago: optimize the use of expensive hardware – analogous evolution in all domains of informatics (e.g., from machine language to high-level programming languages) – human efficiency has become the crucial quality factor • Methodological advances – development of software engineering (“software crisis”) – object technology – methods for software development (from structured methods to OO analysis and design) • Built-in data types (essentially alphanumeric), inadequate for applications with complex data (e.g., design, WWW, multimedia, GIS, biology) • Poor conceptual modeling (e.g., generalization, aggregation, materialization must be explicitly managed by user programs) • Separation between data and operations (stored procedures not integrated in the model, no encapsulation) • RDBMS have not fully exploited the relational model (e.g., domains, integritycontraint support) • Transaction model inappropriate for long-duration transactions for interactive, cooperative design environments • Application programming: mismatch between SQL and traditional programming languages → 4GLs, OODBMSs • Performance of RDBMS insufficient for decision-support applications (⇒ OLAP technology) 11 – continuous migration of program functions into DBMS functions 10 History of Database Management- 7 History of Database Management- 8 Information-Management Context Contributions of OO Technology • Objects are more natural than relations for real-world modeling • Application functions are less stable than data • Complex and application-dependent data types simplify application development • Object identity, for object chaining, sharing • Data encapsulation (ADTs) – some behavior naturally goes with data • Relational technology “solved” business data processing • Success of database technology ⇒ users demand it for domains other than business data processing • Steady improvement in price/performance ratio of hardware and supporting architectures • “Software crisis”: – quality of application development is becoming a major concern – maintenance dominates overall development costs ⇒ reuse – stable public interface for objects – application-development methodology is improving – greater stability, easier evolution • Generalization and inheritance, for reuse, easier adaptation • Polymorphism, for simpler programming, easier evolution • Open systems ⇒ variety of system architectures • Transition to objects has become inevitable 13 12 Application Context • Today’s applications: objects + web + multimedia • More complex data – time series (data warehouses, data mining ⇒ progress in decision-support systems) – multimedia – weakly-structured data (WWW) – complex data (e.g., molecular biology, geographic information systems) • Internet, emergence of electronic commerce 14 History of Database Management- 9 History of Database Management- 10 • Static HTML and GIF files are no longer enough • Need for dynamic, interactive applications, with audio and video streams • Sophisticated, integrated multimedia asset management • Possible convergence data-modeling and document-management technologies: – originally, most web documents were manually composed – they are increasingly generated automatically from databases – HTML is a document-markup language, not a format for document integration and exchange – XML is closer to “semi-structured” databases Problems with OODBMSs • Language-bound (closely tied to C++, Smalltalk, Java) • Rediscover problems of tying database design close to application design • Rediscover that declarative (SQL-like) approach is productive • Lack of robustness for major DBMS functions for update-intensive OLTP applications (security, transaction processing, parallel processing) Object-Oriented Database-Management Systems (OODBMSs) • Extension of object-oriented programming languages (OPLs) to persistent data management 16 • Tighter (seamless) integration between OPLs (C++, Smalltalk, Java) and DBMS • Extensible type system for complex objects • OODBMSs manage (store, call, query) complex objects directly • Performance thru caching large numbers of objects in address space of client program OODBMS Market 15 • Gradually appeared in the early 90’s • Vendors: Object Design, Objectivity, Versant, Computer Associates (Jasmine), Ardent (O2 ) • Niche market (about 100 times smaller than relational market, 100 M vs 10 B) • Market development hindered by existing relational base 17 History of Database Management- 11 History of Database Management- 12 Object-Relational DBMSs • Goals – synthesis of the best of RDBMSs and OODBMSs, i.e., flexibility of data model and performance of DBMS functions OR DBMS Market – saveguard RDBMS investments • Main relational vendors: Informix, Oracle, Sybase, IBM, Microsoft • Differences with RDBMSs – extensible type system (application-dependent new types and functions) – OO inheritance and polymorphism – better integration with PLs than SQL-92 • Roughly similar products • Big business! • Differences with OODBMSs – separate declarative data sublanguage (instead of smooth integration between data language and PL) 18 20 Recent Evolutions Problems with ORDBMS • RDBMS vendors have maintained their strong position • Hybrid approach • Technological integration without new concepts • Weak on inheritance and polymorphism • No sound theoretical foundations • RDBMS are evolving into object-relational systems • New RDBMS developments ensure downward compatibility • Microsoft has entered the mainstream DB server market • OODBMS vendors have not moved from niche markets and have not displaced RDBMS • Bound to compromising • Standards have had a mixed role • New, unproven • Tool vendors have lost importance through extensions of main DBMS products 19 21 History of Database Management- 13 History of Database Management- 14 New Topics (Buzzwords) • Object technology • Conceptual modeling • Enterprise resource planning (ERP) • Data mining • Data warehouses • Knowledge management, competitive intelligence • Interoperability, heterogeneity, integration, ontologies, multidatabases • Object-relational systems • Workflow management 22 History of Database Management- 15