Download A Short History of Database Technology Traditional File

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Extensible Storage Engine wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
• Example: student data in a university
– student address may be needed for registering, library management, financial
office, grade reporting, ...
– each application separately maintains its data files and programs to manipulate
those files
A Short History of Database Technology
– possibly different formats for the same data (e.g., length of names)
– redundant updates (e.g., to change an address)
• Prehistory
– file systems
– hierarchical and network systems
• The revolution: relational database technology
• Post-relational era
– new organizations of data, more complex data
Limitations of File-Based Approach
• Tendency to separate and isolate logically-related data
– subsequent data models capture more information: true DB software “knows
about” some inter-entity or inter-file relationships
– influence of object technology
– more complex applications
• Separate files ⇒ redundancy in defining and storing data
– wasted storage space (only a problem for large applications)
– redundant efforts to enter replicated data and maintain its consistency (serious problem)
1
• Program-data dependence: data definition in application program ⇒ program valid for only one DB with a fixed structure
• Labor intensive
– maintenance difficult (duplication, procedurality)
– inter-file links must be coded in programs
3
Traditional File-Based Approach
• Collection of applications that each define and manage their own files
• File technology was the first attempt to computerize manual filing systems
• File: collection of records containing logically-related data (Pascal’s files of
records, C++’s “struct” or “class” declarations, COBOL’s Data Division)
• It is an obsolete technology, except for applications that do not require DBMS functions. Even for those problems, when efficiency is not crucial and size is not too big,
it is often worth considering simple database software (like, e.g., ACCESS).
• Tight integration between program and data (files)
– physical storage structure visible in application code (e.g., ISAM, VSAM)
– run-time performance can (and must) be programmed
• Each enterprise function was responsible for its own data and accessed it with their
own programs
• File-based approach required much work from application programmers
2
History of Database Management- 1
History of Database Management- 2
Hierarchical and Network Systems
• Software systems without underlying organized discipline
• Lack of data independence
Relational Model
– traditional view of data structures as records (on disk storage) + links: mixture of physical and logical worlds
– user view dependent on physical details irrelevant for information needs
– modern view: links = semantic relationships + physical access paths
• Navigational programming
– all DB accesses imply procedural programming
– programming = “navigation” governed by physical access paths =⇒ long and
complex programs
• Defined in 1970 (T. Codd)
• About 10 years of hesitation
• Robust RDBMSs built in the 80’s
• Historical compromise between scientists, practitioners, vendors
• Most systems sold in the early 90’s are relational
– oriented towards one-record-at-a-time navigational programming ⇒ complex,
necessarily-manual performance optimization
4
5
• Pre-relational (or pre-historical) data management, unsystematic, adhoc
• Hierarchical systems (e.g., IMS, System 2000)
• CODASYL (network) database systems (e.g., IDS, IDMS, TOTAL, ADABAS, PHOLAS)
• Example of data dependence: physical location of fields within records visible in programs
• About pointers:
– pointers provide more explicit integration links (hierarchy, network) than relations
– confusion between logical and physical pointers was the problem: the relational
model established and demonstrated the virtues of data independence
– somewhat cleaner pointers came back with object technology
Novelty and Importance of the Relational Model
• Simplicity of basic concepts
• Simple theoretical framework
• Emergence of notion of data model
• Data independence: separate universe of application from that of implementation
• Emergence of disciplined DB design
• Validation of a declarative, high-level approach to data management
• Construction of powerful DBMS software
• New architectures
• Standards
6
History of Database Management- 3
History of Database Management- 4
Relational Model and Systems Founded Modern DB Technology
• Simplicity of basic concepts: simple, abstract data structure, independent of physical
considerations, sufficiently powerful and general as a realistic framework for numerous
database applications in business data processing
• Simple basis for a theoretical framework that stimulated DB research
– relational theory based on well-established theories: mathematical set theory
and first-order predicate logic
– the DB field turns into a scientific discipline, the pre-relational era had drawn
little interest from computer scientists
– many interesting questions were raised for researchers and the field attracted a
lot of good minds; progress was quick in the 70’s
– remark about theory in general: theoretical basis 6= lack of practicality
∗ theory = reliability, predictability
∗ a strong formal underlying theory is always interesting
∗ users need not master nor know about the whole theory
• Emergence of the notion of data model, quickly leading to semantic data models
with more semantics than the relational model, and a clean a posteriori definition of
network and hierarchic data models. Data models include the definition of integrity
at the application-domain level, as a part of database design.
• Data independence: insulation of users from physical DB design and access optimization; all general trend in informatics
Relational Market Today
• Most users still buy relational, huge installed base
• Leading relational vendors: Oracle, Sybase, Informix, IBM, Microsoft
• Products of similar functionality and performance
• Emphasis on client/server architecture, open systems, interoperability
• System performance, robustness have consistently strengthened (e.g., recovery,
concurrency support, distribution)
• Numerous client tools (4GLs), users want forms, pictures, graphs, sounds, etc.
not tables of numbers.
• But the web is making 4GLs and business forms history ...
• PC systems, competing with the larger DBMSs for lower-end applications
7
• Emergence of disciplined DB design: a more semantic or conceptual view of information structures (e.g., E-R) on top of the relational schema significantly simplifies
analysis and design process
• Design of high-level interactive database languages, based on set-at-a-time, nonprocedural operations, high-level operations (a first in the DB world): demonstrate
that a less procedural style of data manipulation was not only practically feasible but
also highly productive
• Construction of efficient DBMS software, in particular efficient query optimizers,
transaction management for multiple users (concurrency control), reliability and failure recovery
Relational Market ...
• New architectures: database machines, client-server, distributed databases, client
tools, 4GLs
• SQL has been overemphasized, overadvertised, has become inevitable (“COBOL
of the 90’s”)
• SQL Standard = portability of applications
• Widespread lack of education on DB fundamentals
• Some built-in economic incentives for complexity and confusion: market for
books, journals, conferences, education, consulting, ...
• SQL standardization has become an inhibitor to the development of better user
interfaces
8
History of Database Management- 5
History of Database Management- 6
• Economics
– reduction of hardware costs (memory, computing cycles, soon network bandwidth
are essentially free at the margin)
Post-Relational Era
• New organizations of data
– entity-relationship model
– nonnormalized relations
– object technology
– object-relational systems
– semi-structured and unstructured data (WWW)
• New functions
– distribution
– heterogeneity (multidatabases, interoperability)
– deduction
– data analysis (data mining)
• More complex applications: data-management technology enters application
domains (e.g., design, geography, molecular biology, electronic commerce)
– 25 years ago, one hour of machine time ≈ one week of programmer time
– today, one week of programmer time ≈ a personal computer
– relational technology has provided huge overall productivity gains
• Evolution of DBMS software
– development of efficient DBMS functions (e.g., relational query optimizers)
– easier DB programming
– powerful and user-friendly front-end software (traditional DB software is progressively becoming middleware)
From the web:
If the automobile had followed the same development cycle as the computer, a Rolls-Royce
would today cost 100 USD, get one million miles to the gallon, and explode once a year,
killing everybody inside.
9
Weaknesses of Relational Technology
Evolution = Economics + Methodology
• Economics
– name of the game 30 years ago: optimize the use of expensive hardware
– analogous evolution in all domains of informatics (e.g., from machine language
to high-level programming languages)
– human efficiency has become the crucial quality factor
• Methodological advances
– development of software engineering (“software crisis”)
– object technology
– methods for software development (from structured methods to OO analysis
and design)
• Built-in data types (essentially alphanumeric), inadequate for applications with
complex data (e.g., design, WWW, multimedia, GIS, biology)
• Poor conceptual modeling (e.g., generalization, aggregation, materialization must
be explicitly managed by user programs)
• Separation between data and operations (stored procedures not integrated in the
model, no encapsulation)
• RDBMS have not fully exploited the relational model (e.g., domains, integritycontraint support)
• Transaction model inappropriate for long-duration transactions for interactive,
cooperative design environments
• Application programming: mismatch between SQL and traditional programming
languages → 4GLs, OODBMSs
• Performance of RDBMS insufficient for decision-support applications (⇒ OLAP
technology)
11
– continuous migration of program functions into DBMS functions
10
History of Database Management- 7
History of Database Management- 8
Information-Management Context
Contributions of OO Technology
• Objects are more natural than relations for real-world modeling
• Application functions are less stable than data
• Complex and application-dependent data types simplify application development
• Object identity, for object chaining, sharing
• Data encapsulation (ADTs)
– some behavior naturally goes with data
• Relational technology “solved” business data processing
• Success of database technology ⇒ users demand it for domains other than business data processing
• Steady improvement in price/performance ratio of hardware and supporting architectures
• “Software crisis”:
– quality of application development is becoming a major concern
– maintenance dominates overall development costs ⇒ reuse
– stable public interface for objects
– application-development methodology is improving
– greater stability, easier evolution
• Generalization and inheritance, for reuse, easier adaptation
• Polymorphism, for simpler programming, easier evolution
• Open systems ⇒ variety of system architectures
• Transition to objects has become inevitable
13
12
Application Context
• Today’s applications: objects + web + multimedia
• More complex data
– time series (data warehouses, data mining ⇒ progress in decision-support
systems)
– multimedia
– weakly-structured data (WWW)
– complex data (e.g., molecular biology, geographic information systems)
• Internet, emergence of electronic commerce
14
History of Database Management- 9
History of Database Management- 10
• Static HTML and GIF files are no longer enough
• Need for dynamic, interactive applications, with audio and video streams
• Sophisticated, integrated multimedia asset management
• Possible convergence data-modeling and document-management technologies:
– originally, most web documents were manually composed
– they are increasingly generated automatically from databases
– HTML is a document-markup language, not a format for document integration
and exchange
– XML is closer to “semi-structured” databases
Problems with OODBMSs
• Language-bound (closely tied to C++, Smalltalk, Java)
• Rediscover problems of tying database design close to application design
• Rediscover that declarative (SQL-like) approach is productive
• Lack of robustness for major DBMS functions for update-intensive OLTP applications (security, transaction processing, parallel processing)
Object-Oriented Database-Management Systems
(OODBMSs)
• Extension of object-oriented programming languages (OPLs) to persistent data
management
16
• Tighter (seamless) integration between OPLs (C++, Smalltalk, Java) and DBMS
• Extensible type system for complex objects
• OODBMSs manage (store, call, query) complex objects directly
• Performance thru caching large numbers of objects in address space of client
program
OODBMS Market
15
• Gradually appeared in the early 90’s
• Vendors: Object Design, Objectivity, Versant, Computer Associates (Jasmine),
Ardent (O2 )
• Niche market (about 100 times smaller than relational market, 100 M vs 10 B)
• Market development hindered by existing relational base
17
History of Database Management- 11
History of Database Management- 12
Object-Relational DBMSs
• Goals
– synthesis of the best of RDBMSs and OODBMSs, i.e., flexibility of data
model and performance of DBMS functions
OR DBMS Market
– saveguard RDBMS investments
• Main relational vendors: Informix, Oracle, Sybase, IBM, Microsoft
• Differences with RDBMSs
– extensible type system (application-dependent new types and functions)
– OO inheritance and polymorphism
– better integration with PLs than SQL-92
• Roughly similar products
• Big business!
• Differences with OODBMSs
– separate declarative data sublanguage (instead of smooth integration between
data language and PL)
18
20
Recent Evolutions
Problems with ORDBMS
• RDBMS vendors have maintained their strong position
• Hybrid approach
• Technological integration without new concepts
• Weak on inheritance and polymorphism
• No sound theoretical foundations
• RDBMS are evolving into object-relational systems
• New RDBMS developments ensure downward compatibility
• Microsoft has entered the mainstream DB server market
• OODBMS vendors have not moved from niche markets and have not displaced
RDBMS
• Bound to compromising
• Standards have had a mixed role
• New, unproven
• Tool vendors have lost importance through extensions of main DBMS products
19
21
History of Database Management- 13
History of Database Management- 14
New Topics (Buzzwords)
• Object technology
• Conceptual modeling
• Enterprise resource planning (ERP)
• Data mining
• Data warehouses
• Knowledge management, competitive intelligence
• Interoperability, heterogeneity, integration, ontologies, multidatabases
• Object-relational systems
• Workflow management
22
History of Database Management- 15