* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download w01_1_INF280_Basic_Concepts_Concurrency_Control
Survey
Document related concepts
Global serializability wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Oracle Database wikipedia , lookup
Commitment ordering wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational model wikipedia , lookup
Serializability wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
Transcript
INF 280 Database Systems BASIC CONCEPTS D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 1 Typical software application Business Logic INF 280 I n t e r f a c e D. Christozov / G.Tuparov Transforming interface into data request Query (SQL) Data Processing Transforming datasets into reports/forms Datasets INF 280 Database Systems: Basic Concepts Database 2 Basic Concepts - Topics 1. A database as a collection of related data 2. Database and Database Management System 3. Characteristics and advantages of DB approach 4. DB users 5. DB Architecture 6. DBMS Architecture D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 3 DB as a collection of related data (1) • Data: facts that can be recorded and that have implicit meaning. • Database implicit properties: – A database represents some aspect of the real world, sometimes called the miniworld or the Universe of Discourse (UoD). – A database is a logically coherent collection of data with some inherent meaning. – A database is designed, built, and populated with data for a specific purpose. It has an intended groups of users and some applications in which these users are interested. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 4 DB as a collection of related data (2) D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 5 Basic characteristics 1. Self-Describing Nature of a Database System 2. Insulation between Programs and Data, Data Abstraction 3. Support of Multiple Views of the Data 4. Sharing of Data and Multiuser Transaction Processing D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 6 Basic characteristics (1) Self-Describing Nature of a Database System • System catalogue contains information about the structure of each file, the type and storage format of each data item, and various constraints on the data. The information stored in the catalogue is called meta-data, and it describes the structure of the primary database. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 7 Basic characteristics (2) Insulation between Programs and Data, and Data Abstraction • The characteristic that allows program-data independence and program-operation independence is called data abstraction. • A DBMS provides users with a conceptual representation of data that does not include many of the details of how the data is stored or how the operations are implemented. Data model (or logical data model) is a type of data abstraction that is used to provide this conceptual representation. Data model hides storage and implementation details. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 8 Basic characteristics (3) Support of Multiple Views of the Data • A view may be a subset of the database or it may contain virtual data that is derived from the database files but is not explicitly stored. • Different categories of users need different views on the database. • One user may need to solve different problems with database and for every problem may need different view on the data. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 9 Basic characteristics (4) Sharing of Data and Multiuser Transaction Processing • Multiple users may need to access database simultaneously. • The DBMS must include concurrency control software to ensure that several users trying to update the same data do so in a controlled manner so that the result of the updates is correct. • On-line transaction processing (OLTP) applications. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 10 Transaction Processing and Concurrency Control (1) Transaction: execution of a program that accesses and/or changes the content of a file. Concurrency:concurrent execution of two or more transactions. Concurrency mechanisms to avoid failures, losses, etc. in Control: concurrent execution of transactions Single Vs. Multiuser/Multi Tasking Systems: Time Shearing System Log: journal (file), which holds the history of changes the state of a database D. Christozov INF 280: Database Systems Concurency Control 11 Transaction Processing and Concurrency Control (2) ACID Properties of Transaction: A Atomicity C Consistency A correct execution of a transaction takes Preservation the database from one consistent state to another consistent state. I Isolation A transaction should not make its updates visible to other transaction until it is committed. D Durability Once a transaction changes the DB and the changes are committed, these changes must never be lost because of subsequent failure. D. Christozov Transaction is either performed on its entirety or not at all. INF 280: Database Systems Concurency Control 12 Transaction Processing and Concurrency Control (3) Schedule: A schedule S of n transactions T1, T2, …,Tn is an ordering of the execution of operations of the transactions. Operations of two transactions Ti and Tj can be interleaved. Recoverability: Ability to recover from transaction failure. A schedule S is recoverable if no transaction T in S commits until all transactions T’, that have written an item that T reads have committed. Serializability: The concurrent execution of transactions is equivalent of serial execution: Serial, Non-Serial, and Conflict Schedules. Protocols: D. Christozov sets of rules to guarantee “serializability”. INF 280: Database Systems Concurency Control 13 Transaction Processing and Concurrency Control (4) Granularity: What portion of the DB the data item represents record field of a record Locking: block file DB space whole database prevents multiple transactions from accessing the same item concurrently Timestamps: uses unique identifier for each transaction Multiversion: the system uses multiple versions of the same data item Optimistic: D. Christozov validation and certification of transactions INF 280: Database Systems Concurency Control 14 Transaction Processing and Concurrency Control (5) Locks: • Binary lock: two states (locked/unlocked) for each item; • Shared: three states: read-lock, write-lock, unlocked; • Two-phase lock: all locking operations precede the first unlock operation. First phase – expanding; second phase – shrinking. Basic, Conservative, Strict Two-phase locking. • Deadlock: each of two transactions is waiting for other to unlock a given data item. • Livelock: a transaction waits, while the other continue. D. Christozov INF 280: Database Systems Concurency Control 15 Transaction Processing and Concurrency Control (6) Timestamps: order transactions according to their timestamps Multiversion: keeps the old values when the item is updated Optimistic: D. Christozov no checking during execution of the transaction; all updates applied to a local copy of the data item. After execution a validation phase is performed to check serializability. INF 280: Database Systems Concurency Control 16 Transaction Processing and Concurrency Control (7) Testing schedules for serializability: 1. Only read_item and write_item operations are interesting 2. The algorithm is based on constructing precedence (serialization) graph for the schedule: a directed graph G = {N, E}, where N = {T1, T2, …, Tn} nodes and E = {e1, e2, …, en} – adges There is one node for each transaction Ti and an edge ei is a precedence of (TjTk), where Tj is a starting node and Tk – ending node, one operation in Tj appears in the schedule BEFORE some conflict operations in Tk. D. Christozov INF 280: Database Systems Concurency Control 17 Transaction Processing and Concurrency Control (8) Algorithm for testing “serializability” of a schedule S: 1. For each transaction Ti create a node in a precedence graph G. 2. If in S Tj:read_item(X) is after Ti:write_item(X), create an edge (TiTj) 3. If in S Tj:write_item(X) is after Ti:read_item(X), create an edge (TiTj) 4. If in S Tj:write_item(X) is after Ti:write_item(X), create an edge (TiTj) The schedule S is serializable if and only if the G has no cycles. D. Christozov INF 280: Database Systems Concurency Control 18 Transaction Processing and Concurrency Control (9) Examples: Serial Schedules T1 T2 Read item(X) X:=X-N Write item(X) Read item(Y) Y:=Y+N Write item(Y) T1 T2 Read item(X) X:=X+M Write item(X) D. Christozov INF 280: Database Systems Concurency Control 19 Transaction Processing and Concurrency Control (10) Examples: Non Serial Schedules T1 T2 Read item(X) X:=X-N Read item(X) X:=X+M X T1 Write item(X) Read item(Y) T2 X Write item(X) Cycle: {X} Y:=Y+N Write item(Y) D. Christozov INF 280: Database Systems Concurency Control 20 Transaction Processing and Concurrency Control (11) Lost update problem: Transactions Schedule T1 T2 Read_item(X); X:=X-N; T1 T2 Read_item(X); X:=X-N; Write_item(X); Read_item(Y); Y:=Y+N; Write_item(Y); Read_item(X); X:+X+M; Write_item(X); Read_item(X); X:+X+M; Write_item(X); Read_item(Y); Write_item(X); Y:=Y+N; Write_item(Y); The two transactions access and update the same DB item simultaneously. D. Christozov INF 280: Database Systems Concurency Control 21 Transaction Processing and Concurrency Control (11) Dirty Read (temporary update problem): Schedule Transactions T1 T1 T2 Read_item(X); X:=X-N; Write_item(X); Read_item(Y); Y:=Y+N; Write_item(Y); Read_item(X); X:+X+M; Write_item(X); T2 Read_item(X); X:=X-N; Write_item(X); Read_item(X); X:+X+M; Write_item(X); Read_item(Y); failure One transaction updates an item and fails, before correctly update item Y, another transaction uses the already updated item. D. Christozov INF 280: Database Systems Concurency Control 22 Transaction Processing and Concurrency Control (12) Incorrect summary problem: Schedule T1 Transactions T2 Read_Item(A); Sum := Sum+A; T1 T2 Read_item(X); X:=X-N; Write_item(X); Read_item(Y); Y:=Y+N; Write_item(Y); Read_item(A); Sum := Sum+A; Write_item(X); Sum := Sum +X; Read_item(Y); Sum := Sum+Y; Read_item(X); X:=X-N; Write_item(X); Read_item(X); Sum := Sum +X; Read_item(Y); Sum := Sum+Y; One transaction calculates aggregate Read_item(Y); function, Y:=Y+N; while another updates the same record. Write_item(Y); D. Christozov INF 280: Database Systems Concurency Control 23 Advantages of Using DBMS • • • • • • • • Controlling Redundancy (reducing) Preserving Data Integrity Restricting Unauthorized Access Providing Persistent Storage for Program Objects and Data Structures (Object-Oriented DB) Permitting Inferencing and Actions Using Rules Providing Multiple User Interfaces Representing Complex Relationships Among Data Providing Backup and Recovery D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 24 Redundant Data Id# Name Address Code 000101 Ivan Ivanov Scapto 1 COS480 DB System 3 000101 Ivan Ivanov Scapto 1 COS 221 FDS 000101 Ivan Ivanov Scapto 1 AUB 102 000102 Georgi Georgiev Scapto 2 000102 Georgi Georgiev Scapto 2 Student’s information D. Christozov / G.Tuparov Title Cr. Instructor Section Grade Christozov A B- 3 Christozov B B+ Writing 3 Colman C D+ COS 480 DB System 3 Christozov A B+ AUB 102 Writing 3 Colman C C+ Course information INF 280 Database Systems: Basic Concepts Grade information 25 Integrity Grades Id# Name Address Code Title Cr. Instructor Section Grade 000101 Ivan Ivanov Scapto 1 COS480 DB System 3 Christozov A B- 000101 Ivan Ivanov Scapto 1 COS 221 FDS 3 Christozov B B+ 000101 Ivan Ivanov Scapto 1 AUB 102 Writing 3 Colman C D+ 000102 Georgi Georgiev Scapto 2 COS 480 DB System 3 Christozov A B+ 000102 Georgi Georgiev Scapto 2 AUB 102 Writing 3 Colman C C+ missing Faculty Family Name Given Name Title Office Bonev Stoyan Assoc. Professor 221 Colman Mark Professor 231 D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 26 Actors on the Scene • DB Administrators • DB Designers • End Users: – Casual – Naive (parametric) – Sophisticated – Stand-alone D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 27 DB Administrators • The DBA is responsible for authorizing access to the database, coordinating and monitoring its use, and acquiring software and hardware resources as needed. • The DBA is accountable for problems such as security breaches and poor system response time. In large organizations, the DBA is assisted by a staff that carries out these functions. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 28 Database designers • Database designers are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These tasks are mostly undertaken before the database is actually implemented and populated with data. • Database designers responsibility is to communicate with all prospective database users in order to understand their requirements and to create a design that meets these requirements. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 29 End Users • Casual end users occasionally access the database, but they may need different information each time. They use a sophisticated database query language to specify their requests and are typically middle- or high- level managers or other occasional browsers. • Naive or parametric end users main job function revolves around constantly querying and updating the database, using standard types of queries and updates - called canned transactions - that have been carefully programmed and tested. Examples: Bank tellers, Reservation agents, etc. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 30 End Users (cont.) • Sophisticated end users include engineers, scientists, business analysts, and others who thoroughly familiarize themselves with the facilities of the DBMS in order to implement their own applications to meet their complex requirements. • Standalone users maintain personal databases by using ready-made program packages that provide easy-to-use menu-based or graphics-based interfaces. An example is the user of a tax package that stores a variety of personal financial data for tax purposes. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 31 Actors Behind the Scene • DBMS Systems designers and implementers design and implement the DBMS modules and interfaces as a software package. • Tools developers design and implement tools - the software packages that facilitate database modeling and design, database system design, and improved performance. • Operators and Maintenance personnel are responsible for the actual running and maintenance of the hardware and software environment for the database system. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 32 DB History Database Systems: the success story of Computer Science • • • • • • Early applications: use of File Systems 1960s: Hierarchical and Network DB models Late 1970s: Codd’s Relational Model Late 1980s: OODB -> R-OO DB 1990s: SQL standards, WWW, E-Commerce Spatial DB, Data Warehouses, Data Mining D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 33 DB Model Data Model: collection of concepts that can be used to describe the structure of a database Structure: data types; relationships; constraints Operation: retrieve, insert, delete, modify, userdefined operations Behavior: dynamic Object-Oriented Models incorporate both structure and behavior In “classical” models (hierarchical, network or relational) behavior is limited to generic operations. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 34 DB Model: Categories of Data Models High-level – conceptual Low-level – physical Representational – logical How users perceive data. How data is actually stored on computer. Close to the way users understand data, but allow direct interpretation by given DBMS. Database schema: Description of database model. Most data models have certain conventions for displaying schemas as diagrams. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 35 Schema Instance State • In any data model, it is important to distinguish between the description of the database and the database itself. The description of a database is called the database schema, which is specified during database design and is not expected to change frequently. • The data in the database at a particular moment in time is called a database state or snapshot. It is also called the current set of occurrences or instances in the database. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 36 DB and DBMS • A database management system (DBMS) is a collection of programs that enables users to create and maintain a database. • The DBMS is a general-purpose software system that facilitates the processes of – defining, – constructing, and – manipulating databases for various applications. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 37 DBMS • DBMS supports the following categories of languages: – Data definition language (DDL). – Storage definition language (SDL) – View definition language (VDL) – Data manipulation language (DML), including querying language • Note: In current DBMSs, these types of languages are not considered distinct languages. D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 38 DBMS Components DB designers DB administrators Sophisticated Users Naive Users Query compiler DML compiler DDL interpreter System catalogue Run-time processor Boundaries of DBMS Data manager Concurrency control, recovery, backup subsystems Recorded DB D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 39 DBMS Architecture • The Three-Schema Architecture 1. The internal level (internal schema), describes the physical storage structure of the database. 2. The conceptual level (conceptual schema), describes the structure of the whole database for a community of users. 3. The external or view level includes a number of external schemas or user views. • Data Independence 1. Logical data independence. 2. Physical data independence D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 40 The Three-Schema Architecture Categories of Users External Schemas Logical Data Independence Logical Schema Physical Schema Data Files Master Files D. Christozov / G.Tuparov Indexes INF 280 Database Systems: Basic Concepts Physical Data Independence Meta Data System Catalog 41 Database System Utilities 1. 2. 3. 4. Loading Backup File reorganization Performance monitoring D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 42 Q&A D. Christozov / G.Tuparov INF 280 Database Systems: Basic Concepts 43