* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download external (pl/1)
Survey
Document related concepts
Transcript
A Lecture Note on DATABASE SYSTEMS Fall, 2001 School of Computing, Soongsil University Prof. Sang Ho Lee [email protected] 1 Table of Contents – – – – – – – – – – Introduction ER data model, OO data model Relational data model Normalization: functional dependency, multivalued dependency Relational algebra Datalog Relational Query Language: SQL Constraints and Triggers in SQL SQL programming, Transaction, Authorization Object-oriented query language: OQL, SQL3 2 Introduction 3 DB, DBMS, DBS • Databases – informally a collection of related data – In general, stored in computer, too large to hold in main memory, subject to change constantly • DBMS (Database Management System) – a collection of software that not only allows us to define, construct, manipulate databases but also provides a number of desirable functionalities (data independence, data sharing, recovery, security, etc.) – one fundamental system software • DBS (Database System) – In general, Databases + DBMS – Database, DBMS, DBS are used interchangeably (at least in practice) 4 Simplified Database System Users/Programs Database System DBMS Software Application Program/Queries Queries Processor Storage Manager Databases, Metadata 5 Typical Applications of DB Technology • • • • • • Airline Reservation system Banking system Corporate data Stock market World wide web (in short , web) many many more 6 Databases in General • • • • • One fundamental course in Computer Science unquestionably Constitute a fundamental system software Plays a core roll in information technology Has millions of applications in the real world Strong technology demand from industry (compared with other areas in Computer Science) 7 Recommended Textbooks – J.D. Ullman and J. Widom, A First Course in Database Systems, Prentice Hall, 1997. – H. Garcia-Molina, J.D. Ullman, and J. Widom, Database System Implementation, Prentice-Hall, 2000. – H. Korth and A. Silberschatz, Database System Concepts (third edition), McGraw-Hill, 1997. – R. Elmasri and S.B. Navathe, Fundamentals of Database Systems (second edition), The Benjamin/Cummings Publishing Company, 1994. – C.J. Date, An Introduction to Database Systems (7th edition), Addison Wesley, 2000. 8 Database Technology is Constantly Evolving !!! • Technical Journals – – – – ACM Transactions on Database Systems (TODS) -- quarterly IEEE Transactions on Knowledge and Data Engineering -- quarterly The VLDB Journal -- quarterly ACM SIGMOD Record, IEEE Data Engineering Bulletin • Major conferences – – – – – ACM SIGMOD International Conference on Management of Data IEEE International Conference on Data Engineering International Conference on Very Large Data Bases ACM SIGACT-SIGMOD Symposium on Principles of Database Systems DASFAA, DEXA, etc. • Trade Journals: Data Base Newsletter, Database Review, InfoDB, Database Programming & Design, etc. • Trade shows: Database World,. DB/Expo • A number of vendor reference manuals, technical reports, etc. 9 • Roughly 100,000 pages of new materials published every year ! Traditional File Systems (1) – Possible to use file systems, which is part of an operating system (say, Unix, Windows, …), to manage (create, update, retrieve, …) data – File systems may support a number of primitive operations, by which user may construct various kinds of access methods (such as sequential, indexed sequential, hashing, …) to files stored in computer hard disks. – File systems tend to consists of a set of different data files and different application programs, which are working independently. 10 Traditional File Systems (2) 사용자 1 응용 프로그램 1 운영 체제 사용자 2 응용 프로그램 2 ... 사용자 n ... 응용 프로그램 n 화일 시스템 화일 1 화일 2 ... 화일 n 11 Problems in Traditional File Systems – Each user defines and implements the file needed for a specific application • data redundancy and inconsistency • space is wasted – Structure of data files is embedded in the application programs • any change in data files requires changing all programs – Users need to know the details of physical data organizations • no data abstraction • not self-contained natures (file system usually does not contain a description or definition of data) – No support of multiple views of data – Data security problem • difficult to protect data from malicious access – Data integrity problem • difficult to protect data from accidental loss of consistency 12 Objectives of Database Systems (1) • Controlled data redundancy – sometimes, better to permit redundant data particularly in distributed DBMS • Data sharing – allows multiple users to access the databases at the same time – provides multiple interfaces – supports concurrency control • Data independency – immunity of application programs from underlying physical organization and changes 13 Objectives of Database Systems (2) • High-level query language support – – – – Easy to use, hide internal data structures Example: SQL Tends to be declarative (cf. Procedural language) Query optimization technique • Security and authorization facility – who can perform what operations on what data in what circumstances • Enforcement of integrity constraints – ensure that data in the databases is accurate • Data modeling and abstraction • Recovery facility – databases should survive all kinds of failures that can occur 14 Actors on the Scene – DBA (Database administrator): a chief administrator • • • • system monitoring and maintenance authorizing database accesses (database security) storage structure and access method definition schema definition and maintenance, etc. – Database designer • responsible for database design • requirement analysis for particular applications • schema definition (creation) and authorization – Application programmer • develop application programs for particular applications – End users, casual users, naive users • uses a QBF (Query by Form), report generator, various canned transactions 15 Classifi. of DBMS (1) – The following classification is based on data model – Hierarchical databases • first developed mid 1960s • IBM IMS (Information Management System) – Network databases • CODASYL Database Task Group Report (1971) • IDMS (Cullinet), TOTAL (Cincom) – Relational databases • relational data model (by T. Codd, 1970) • most widely used currently • numerous commercial systems: DB2, Oracle, Informix, SQL/DS, Sybase, Access, MS SQL Server, etc. – Object-oriented databases • first appears in mid 1980s • mainly intended for special applications: CAD/CAM,), engineering databases, etc. – Object-relational databases • relational DBMS + OO DBMS 16 Relational Database Systems • Database is viewed as a set of relations (tables) and constraints 구좌 • 구좌번호 12345 잔고 1,000,000 23456 120,000 … … 유형 보통예금 정기적금 … Query: Retrieve the balance of the account 12345 – Select 잔고 From 구좌 Where 구좌번호 = 12345; • Query: 질의: 잔고가 0 이하인 정기적금 구좌번호를 검색하라 – Select 구좌번호 From 구좌 Where 잔고 < 0 AND 유형 = ‘정기적금’; 17 Classifications of DBMS (2) • Where databases are located? – centralized databases vs. distributed databases – distributed databases • homogeneous databases • heterogeneous databases – federated databases – multidatabases • What functionality are emphasized? – – – – Deductive databases Active databases Real-time databases Temporal databases, etc. 18 Data Model • DBMS should support some level of data abstraction by hiding details of internal data organization (particularly, physical data storage structure) • Data model – the main tool for providing data abstraction – conceptual tools for describing data and data relationship – i.e. used to describe the structure of a database (data type, relationship, constraints, etc.) 19 Categories of Data Models • High-level (Conceptual) data model – a human-oriented data model – concepts such as entities, relationships, attributes – presented by Entity-relationship model • Implementation data model – high level description of the implementation – used most frequently in current commercial DBMSs – Relational model, Network model, Hierarchical model, object-oriented model • Low-level (physical) data model – on how data is stored in computer (i.e. record format, record ordering, access path, etc.) 20 Entity-Relationship diagram example id name level class age student name dept M year taken N hours credit course semester 21 Schema and Instance – Schema • loosely speaking, a description (definition) of data such as item name, data type, constraints etc. • not frequently change • also called "intension" – Instance (occurrence) • actual data (contents) that is stored in some schema • expected to change frequently • also called "extension of the schema" – See the difference bet. database schema and database instance 22 Three-schema Architecture (ANSI/SPARC Architecture) • Goal – Program-data independence (insulation of programs and data) – Support of multiple user views • Schema is defined at the following three levels – External schema • A description of a part of database in which a particular user is interested – Conceptual schema • Describes the structure of the whole databases • A global description of the database that hides the details of physical storage structures – Internal schema • Describes the physical storage structure of databases (record type, physical sequence of stored records, what indexes exist, access path, etc.) 23 Three-schema Architecture (2) External (user) view User 1 User n External schema1 External scheman ... external/conceptual mapping Conceptual view Conceptual schema conceptual/internal mapping Internal view Internal schema 24 Example of the Three Schema (3) EXTERNAL DCL 1 2 3 (PL/1) EMPP, EMP# CHAR(6), SAL FIXED BIN(31); CONCEPTUAL EXMPLOYEE EMPLOYEE_NUMBER DEPARTMENT_NUMBER SALARY INTERNAL STORED_EMP PREFIX EMP# DEPT# PAY EXTERNAL (CORBOL) 01 EMPC. 02 EMPNO PIC X(6). 02 DEPTNO PIC X(4). CHAR(6) CHAR(4) NUMERIC(5) LENGTH = 20 TYPE=BYTE(6), TYPE=BYTE(6), TYPE=BYTE(4), TYPE=FULLWORD, OFFSET=0 OFFSET=6, INDEX=EMPX OFFSET=12 OFFSET=16 25 Three-schema Architecture (4) • Serves as a reference DBMS architecture even though most DBMSs do not support the three level completely • Data actually exist at the physical level only – Needs to provide "mapping" bet. levels • Most relational systems permit the definition of one external view to be expressed in terms of other external views (i.e. an external/external mapping), too 26 Data Independence • Capacity to change the schema at one level without having to change the schema at the next higher level • Logical data independence – Capacity to change the conceptual schema without having to change external schemas – Only mapping bet. the conceptual schema and external schema needs to be changed • Physical data independence – Capacity to change the internal schema without having to change the conceptual (or external) schema – Only mapping bet. the conceptual schema and internal schema needs to be changed • Three-level schema makes data independence true 27 Database Languages • Data Definition Language (DDL) – To define, alter, drop schemas – e.g. create table, create schema, drop table, etc. • Data Manipulation Language (DML) – To retrieve, insert, delete and modify database instances – Tends to be declarative (i.e. non-procedural) – e.g. insert record, retrieve record, delete record, etc. • Data Control Language (DCL) – To control and monitor various database operations (such as authorization, server connection, transaction processing, etc.) – e.g. grant, revoke, commit, rollback, prepare, etc. 28 Trend of Modern Database Systems (1) • Types, classes, objects – Object-oriented paradigm – Rich set of types, OID, encapsulation, abstract data type, inheritance – Example Class Account = { account#: integer; balance: real; owner: REF Customer;} Deposit(a: Account, m:real) // method example • Constraints and triggers – An active DBMS is a DBMS that allows users to specify actions to be taken automatically, without user intervention, when certain conditions arise – “ON” conditions of CODASYL 29 Trend of Modern Database Systems (2) • Multimedia data – Inclusion of multimedia data such as video, audio, image, text, etc. – Very large size (up to terabytes, petabytes), unstructured data in natural – Poses new technical issues such as data analysis, presentation synchronization, query optimization, data buffering, tertiary storage, … • Data integration – – – – How to interoperate with legacy databases Multidatabases Data warehouses, OLAP (On-line analytic processing) Data mining: search for interesting and unusual patterns in data 30