* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Introduction to Database Systems
Survey
Document related concepts
Transcript
70420403- Database Systems 数据库管理系统 Li Jianguo [email protected] 1 References Raghu Ramakrishnan/Johannes Gehrke, Database Management Systems, 2nd Ed. , 清华大学出版社 /McGraw-Hill, 2000 C. J.Date, 数据库系统导论(第7版), 机 械工业出版社 2 What is DBMS? Need for information management A very large, integrated collection of data. Models real-world enterprise. – – Entities (e.g., students, courses) Relationships (e.g., Liu Wei is taking 70420403) A Database Management System (DBMS) is a software package designed to store and manage databases. 3 Why Use a DBMS? Data independence and efficient access. Data integrity and security. Uniform data administration. Concurrent access, recovery from crashes. Replication control Reduced application development time. 4 File Systems program 1 data description 1 File 1 program 2 data description 2 File 2 program 3 data description 3 File 3 5 Database Management Application program 1 (with data semantics) Application program 2 (with data semantics) DBMS description manipulation control database Application program 3 (with data semantics) 6 Why Study Databases?? Shift from computation to information – – at the “low end”: access to physical world at the “high end”: scientific applications Datasets increasing in diversity and volume. – – ? Digital libraries, interactive video, Human Genome project, e-commerce, sensor networks ... need for DBMS/data services exploding DBMS encompasses several areas of CS – OS, languages, theory, AI, multimedia, logic 7 Database Systems 8 Key Terms cont. Database System – A database, a database management system and appropriate hardware and personnel. number, Designation Hours worked, Pay rate Insurance, Pension 9 Database System: Hardware Set of physical devices on which a database resides. It consists of one or more computers, disk drives, CRT terminals, printers, tape drives, connecting cables and other auxiliary and connecting hardware. 10 Database System: Hardware cont. Printer TAPE Client Server Client 11 Database System: Software A database software includes two types of software – General-purpose database management software, usually called the database management system (DBMS) – Application software that uses DBMS facilities to manipulate the database to achieve a specific business function, such as providing reports or documents, which can be used by users. 12 Database System: Software cont. Application software is generally written standard programming language such as C, or it may be written in a language (commonly called a fourth-generation language) supplied with the DBMS. These programs utilise the command language of the DBMS and make use of the information contained in the data dictionary. 13 Database System: Software cont. User Interface – Language, menus and other facilities by which users interact with various system components, such as application programs, the DBMS Computer-aided software engineering (CASE) tools – Automated tools used to design databases and application programs. 14 Data Dictionary/Directory (Repository) A subsystem that keeps track of the – definitions of all data items in the database. – relationships that exists between various data structures. – indexes that are used to access data quickly. – screen and report format definitions that may be used by various application programs. 15 Data Dictionary Definitions of data items in the database includes: – Elementary-level data items (fields), – group and record-level data structures, and – files or relational tables. 16 Metadata • Data that describe the properties or characteristics of other data. • Some of these properties include data definitions, data structures and rules or constraints. Item name, the data type, length, minimum and maximum allowable values (where appropriate) and a brief description of each data item. • Metadata allow database designers and users to understand what data exist, what the data mean. • Data without clear meaning can be confusing, misinterpreted or erroneous. 17 Metadata E.g. Data Item Name Type Value Length Min Max Description Name ID Dept Age 30 9 10 2 Character Number Character Integer Name Character 15 Manager Number 9 18 60 Employee Name Employee No. Dept. No. Employee Age Dept. Name Mgr. Emp. No. Employee No. (ID) unique Manager is an employee of the organisation 18 Database System: People Two different types of people (users and practitioners) are concerned with the database. Users – who need information from the database to carry out their primary business responsibility e.g. Executives, managers, staff, clerical personnel 19 Database System: People cont. Practitioners – people responsible for the database system and its associated application software. e.g. Database administrators, analysts, programmers, database and system designers, information systems managers. 20 Components of a Database System Four components: People, H/W, S/W, Data Practitioners (analysts and database designers) in consultation with users identify data needs and design database structures to accommodate these needs. The database structures are specified to the DBMS through the data dictionary. 21 Components of a DBS cont. Users enter data into the system by following specified procedures. The entered data are maintained on hardware media such as disks and tapes. Application programmes that access the database are written by practitioners and users to be run on computers. 22 Components of a DBS cont. Application Program Analyst Application Program DATA DICTIONARY/ DIRECTORY Application Program DBMS DATAB ASE Terminals User Computer PRINTER 23 Key Terms cont. Database Management System – Systems software that facilitates the management of a database. E.g. Oracle, Access, SQL Server 24 Database Management System (DBMS) DBMS – An application software that organises data into records in one or more databases and allows organising, accessing and sorting of the data in a variety of formats. 25 DBMS cont. Relational DBMS – Most common type of DBMS. Data elements are stored in different tables made up of rows and columns. Relates data in different tables through the use of common data element(s). 26 Components of a DBMS DBMS USER USER User Query and Reporting Facilities Application Program Development Facilities PROGRAMMER Data Dictionary/ Directory Subsystem DATA DICTIONARY/ DIRECTORY Security and Integrity Subsystem Database Access Subsystem DATAB ASE PROGRAMMER 27 What is not a DDBS? A timesharing computer system A loosely or tightly coupled multiprocessor system A database system which resides at one of the nodes of a network of computers - this is a centralized database on a network node 28 Data Security The database is a valuable resource needing protection. The DBMS provides database security by limiting access to the database to authorised personnel. Authorised users will generally be restricted as to the particular data they can access and whether they can update it. 29 Data Security Access is often controlled by passwords and by data views, which are definitions of restricted portions of the database. 1 2 3 4 Data items 30 Data Integrity The integrity and consistency of the database are protected via constraints on values that data items can have and by backup and recovery capabilities provided within the DBMS. Data constraint definitions are maintained in the data dictionary. 31 Data Integrity Backup and recovery are supported by software that automatically logs changes to the database and provides for a means of recovering the current state of the database in case of system failure. 32 Concurrent Data Access One of the chief functions of the DBMS is to support the access, retrieval and update of data in the database. The DBMS provides the physical mechanisms allowing multiple users to access a variety of related data quickly and efficiently. 33 User-Oriented Data Manipulation DBMS provides user-oriented data manipulation tools. Easy-to-use query languages allow users to formulate queries and request one-time reports directly from the database. Often query languages will contain facilities to format the results of queries as reports. 34 Data Manipulation Report generators have more powerful reporting facilities than those in the query language. 35 Application Development The DBMS commonly provides significant assistance to the application programmer. – Tools for screen, menu and report generation – application generators – compilers – data and view definition facilities 36 Application Development Modern database systems provide language components that are much more powerful than those of traditional languages, making the programming process itself considerably more efficient. – Developer 2000/PowerBuilder for Oracle – Visual Basic for Microsoft SQL server 37 Database Applications Databases range from those for a single user with a desktop computer to those on mainframe computers with thousands of users. Personal databases Workgroup databases Departmental databases Enterprise databases 38 Personal Computer Databases Designed to support one user with a standard alone PC. E.g. a sales person keeping track of this customer information with contact details. 39 Workgroup Databases A relatively small team of people (less than 25) who collaborate on the same project or application. E.g. a software development team maintaining a list of software objects. 40 Department Databases A department is a functional unit of an organisation. It is larger than a workgroup. Department databases are designed to support the various functions and activities of a department. E.g. a personnel database that is designed to track data concerning employees, jobs, skills and job assignments. 41 Enterprise Databases An enterprise is one whose scope is the entire organisation or enterprise. Such databases are intended to support organisation-wide operations and decision making. E.g. a large health care organisation that operates a group of medical centre's including hospitals, clinics and nursing homes. 42 Enterprise Databases cont. An enterprise database does support information needs from many departments. The most important type of enterprise database today is called a data warehouse. Data warehouse – An integrated decision support database whose content is derived from the various operational databases. 43 Database Environment Marketing Sales Accounting Advertising Accounts Receivable Accounts Payable Accounting Corporate Database Marketing Purchasing 44 3 Level ANSI/SPARC Architecture Made databases more independent of application Became a standard for the organisation of DBMS 45 3 Levels External – User’s and Application’s view of data Conceptual – Logical Data Model Physical – Physical data Model 46 External (Sub) Schema – defines the external view of data as seen by a user or program Conceptual Schema – defines the logical view of data as seen by all users and programs Physical (Internal) Schema – defines the physical view of data as seen by a DBMS 47 3 Level Architecture user a user i/program j program x sub-schema a sub-schema i sub-schema z conceptual schema physical schema Databa ses 48 Physical View • The DBMS must know – exact physical location – precise physical structure database Employee record A.B.C. De Silva |222, Galle Road, Colombo Name (20 characters) | Address (40 characters) 650370690V|Senior Lecturer NID (10 char) Designation (15 char) 49 Physical View • Physical view provides the disk drives, physical addresses, indexes and pointers. • Physical Database Design is the responsibility of the Database Administrator (DBA). No user is concerned with this view. • Physical devices to contain the data • Access methods to retrieve and update data • maintain and improve database performance 50 Logical View • The user/application must know – existence – logical reference NID 650370690V database Employee Name A.B.C. De Silva Designation Senior Lecturer Address 222, Galle Road, Colombo 51 Logical View • This is a single logical description of all data elements and their relationships. • It is the result of the conceptual design which involves analysis of all users information needs and data definition of data items needs to meet them. • SQL CREATE TABLE statement is used to define the data elements. 52 External View • The user/application see – authorised data – own format database Lecturer Name A.B.C. De Silva Department Dept. of Computer Science Designation Senior Lecturer Age 35 53 External View • This consists of user views of the database. • Each definable user group will have its own view of the database. • Each of these views gives a useroriented description of the data elements and relationships of which the view is composed. 54 User View • A logical description of some portion of the database that is required by a user to perform some task. E.g. Benefit application user will view part of the employee data, excluding data such as date of birth and salary. 55 External View • It can be derived directly from the conceptual schema. • The collection of all user views is the external level. • SQL CREATE VIEW statement is used to create these views and SQL GRANT statement is used to restrict its use to a user group. 56 External View cont. • External Views Allows to – hide unauthorised data e.g. salary, dob – provide user view e.g. view employee name, designation, department data taken from employee and department files – derive new attributes e.g. age derived from dob or nid 57 External View cont. • External Views Allows to – change unit of measurement e.g. show age in years or months – define security levels e.g. update access to employee file read-only to department file 58 Mapping between Levels • DBMS map or translate from one level to another. External Conceptual Conceptual Physical • Database exists in reality only at the internal level. 59 DML/SQL user a user i/program j program x sub-schema a sub-schema i sub-schema z SDDL conceptual schema DDL physical schema Users Databa ses Practitioners: DBA 60 Practitioners:DBA • Data Definition Language (DDL) - is the language component of a DBMS that defines each data element as it appears in the database. • Sub-Schema Data Definition Language (SDDL) - is the language component of a DBMS that defines data elements as it should appear to the end users and programmers. 61 User • Data Manipulation Language (DML) - is a language associated with a DBMS that is employed by end users and programmers to manipulate data in the database. • Structured Query Language (SQL) - pronounced as sequel, is the standard data manipulation for relational DBMSs. 62 Characteristics of Early DBMS Centralised Storage Management Transaction Management Recovery Facilities System Maintained Access Paths 63 Characteristics of Modern DBMS Query processing and optimisation Transaction management Concurrency control Database recovery Database security and authorisation Distributed databases Data warehousing and data mining 64 Function of a DBMS Functions of a DBMS are concerned with providing efficient flexible data processing capabilities without compromising data validity. Main Functions are: – – – – Shared data Control redundancy Data integrity Data security 65 Capabilities of a DBMS Capabilities vary considerable, but basic operations are: Data definition Data entry Data manipulation Data display 66 Database Approach Advantages • • • • • • • • • Program-data independence Minimal data redundancy Improved data consistency Improved data sharing Increased productivity of application development Enforcement of standards Improved data quality Improved data accessibility and responsiveness Reduced program maintenance 67 Program-Data/Data Independence • The separation of data descriptions (metadata) from the application programs that use the data. In the database approach data descriptions are stored in a central location called the data dictionary. This property allows an organisation’s data to change and evolve (within limits) without changing the application program that process the data. 68 Minimal Data Redundancy • Data files are integrated into a single, logical structure. Each primary fact is recorded (ideally) in only one place in the database. E.g. Employee data not with the payroll and benefit files. Note: Data redundancy is not eliminated entirely. Some data items will appear in more than one place (e.g. employee no.) to represent the relationship with others. 69 Improved Data Consistency • By eliminating (or controlling) data redundancy, we greatly reduce the opportunities for inconsistency. E.g. employee address is stored only once and hence we cannot have disagreement on the stored values. • Also, updating data values is greatly simplified and have avoid the wasted storage space. 70 Improved Data Sharing • A database is designed as a shared corporate resource. Authorised users are granted permission to use the database, and each user (or group of users) is provided one or more user views to facilitate this use. E.g. employee data common to payroll, benefit applications will be shared among different users. 71 Increased Productivity of Application Development • A major advantage of the database approach is that it greatly reduces the cost and time for developing new business applications. – Programmer could concentrate on the specific functions required for the new application, without having to worry about design or low-level implementation details; as related data have already been designed and implemented. 72 Increased Productivity of Application Development cont. – DBMS provides a number of high-level productivity tools such as forms and report generations and high-level languages that automate some of the activities of database design and implementation. 73 Enforcement of Standards – When the database approach is implemented with full management support, the database administration function should be granted single-point authority and responsibility for establishing and enforcing data standards. – Standards include naming conventions, data quality standards and uniform procedures for accessing, updating and protecting data. – Powerful set of tools for developing and enforcing these standards are available for some DBMS. 74 Improved Data Quality A number of tools and processes are available to improve data quality. – Database designers can specify integrity constraints that are enforced by the DBMS. – One of the objectives of a data warehouse environment is to clean up operational data before they are placed in the data warehouse. Constraint A rule that cannot be violated by database users. 75 Improved Data Accessibility and Responsiveness With relational database, end users without programming experience can often retrieve and display data, even when it crosses traditional departmental boundaries. – English-like query language SQL and query tools such as Query-By-Example provide such facilities. 76 Reduced Program Maintenance Stored data are changed frequently for variety of reasons such as new data items types are added, and data formats change (e.g. date format from two-digit to four digit). Data independence allows to reduce the program maintenance time. 77 Database Approach Disadvantages • DBMS are more vulnerable than file-based system because of the centralised nature of a large integrated database. • If a failure occurs the recovery process is more complex and some times may results in lost transactions. • Hardware, software and personnel cost are higher for DBMS. 78 Data Models A data model is a collection of concepts for describing data. A schema is a description of a particular collection of data, using the a given data model. The relational model of data is the most widely used model today. – – Main concept: relation, basically a table with rows and columns. Every relation has a schema, which describes the columns, or fields. 79 Levels of Abstraction Many views, single conceptual (logical) schema and physical schema. – – – Views describe how users see the data. Conceptual schema defines logical structure Physical schema describes the files and indexes used. View 1 View 2 View 3 Conceptual Schema Physical Schema Schemas are defined using DDL; data is modified/queried using DML. 80 Example: University Database Conceptual schema: – – – Physical schema: – – Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) Relations stored as unordered files. Index on first column of Students. External Schema (View): – Course_info(cid:string, enrollment:integer) 81 Data Independence Applications insulated from how data is structured and stored. Logical data independence: Protection from changes in logical structure of data. Physical data independence: Protection from changes in physical structure of data. One of the most important benefits of using a DBMS! 82 Concurrency Control Concurrent execution of user programs is essential for good DBMS performance. – Because disk accesses are frequent, and relatively slow, it is important to keep the CPU humming by working on several user programs concurrently. Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed. DBMS ensures such problems don’t arise: users can pretend they are using a single-user system. 83 Transaction: An Execution Unit of a DB Key concept is transaction, which is an atomic sequence of database actions (reads/writes). Each transaction, executed completely, must leave the DB in a consistent state if DB is consistent when the transaction begins. – – – Users can specify some simple integrity constraints on the data, and the DBMS will enforce these constraints. Beyond this, the DBMS does not really understand the semantics of the data. (e.g., it does not understand how the interest on a bank account is computed). Why not? Thus, ensuring that a transaction (run alone) preserves consistency is ultimately the user’s responsibility! 84 Scheduling Concurrent Transactions DBMS ensures that execution of {T1, ... , Tn} is equivalent to some serial execution T1’ ... Tn’. – – – Before reading/writing an object, a transaction requests a lock on the object, and waits till the DBMS gives it the lock. All locks are released at the end of the transaction. (Strict 2PL locking protocol.) Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X), one of them, say Ti, will obtain the lock on X first and Tj is forced to wait until Ti completes; this effectively orders the transactions. What if Tj already has a lock on Y and Ti later requests a lock on Y? What is it called? What will happen? 85 Ensuring Atomicity DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact. Idea: Keep a log (history) of all actions carried out by the DBMS while executing a set of Xacts: – – Before a change is made to the database, the corresponding log entry is forced to a safe location. (WAL protocol.) After a crash, the effects of partially executed transactions are undone using the log. (Thanks to WAL, if log entry wasn’t saved before the crash, corresponding change was not applied to database!) 86 The Log The following actions are recorded in the log: – Ti writes an object: the old value and the new value. – Log record must go to disk before the changed page! Ti commits/aborts: a log record indicating this action. Log records chained together by Xact id, so it’s easy to undo a specific Xact (e.g., to resolve a deadlock). Log is often duplexed and archived on “stable” storage. All log related activities (and in fact, all CC related activities such as lock/unlock, dealing with deadlocks etc.) are handled transparently by the DBMS. 87 Databases make these folks happy ... End users and DBMS vendors DB application programmers – e.g. webmasters Database administrator (DBA) – – – – Designs logical /physical schemas Handles security and authorization Data availability, crash recovery Database tuning as needs evolve Must understand how a DBMS works! 88 These layers must consider concurrency control and recovery Structure of a DBMS A typical DBMS has a Query Optimization layered architecture. and Execution The figure does not Relational Operators show the concurrency Files and Access Methods control and recovery components. Buffer Management This is one of several Disk Space Management possible architectures; each system has its own variations. DB 89 Summary DBMS used to maintain, query large datasets. Benefits include recovery from system crashes, concurrent access, quick application development, data integrity and security. Levels of abstraction give data independence. A DBMS typically has a layered architecture. DBAs hold responsible jobs and are well-paid! DBMS R&D is one of the broadest, mature areas in CS. 90 What to Study in 70420403? ? DBMS basics – ER, relational model, SQL, DB design using FD Transaction management – atomicity, consistency, isolation, durability – concurrency control and recovery Databases/data services for real-time applications – scheduling and CC – QoS metrics Data services in emerging applications – event services, sensor networks, data aggregation, streaming data 91