* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Independence
Survey
Document related concepts
Operational transformation wikipedia , lookup
Data center wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Data analysis wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Versant Object Database wikipedia , lookup
Information privacy law wikipedia , lookup
Concurrency control wikipedia , lookup
3D optical data storage wikipedia , lookup
Data vault modeling wikipedia , lookup
Open data in the United Kingdom wikipedia , lookup
Clusterpoint wikipedia , lookup
Business intelligence wikipedia , lookup
Transcript
Class 1 - Database Overview (3/30/1999) Fundamental database Concepts Data Data is a gathered body of facts that of interest. It helps you keep records and make decisions. Database A database is a collection of sharable data (or information) for a common purpose. It is organized so that its contents can easily be accessed, managed and updated. DBMS (Database Management System) A program product for keeping computerized records about an enterprise. A layer of software between users and data storage). A DBMS consists of a collection of interrelated data and a set of programs to access those data. Ex: Oracle, Sybase, Informix… Users send requests for accessing the database to DBMS and DBMS performs operations on data as requested DBMS Functions interact with the users and data storage security management data integrity backup and recovery concurrency control transaction support Data dictionary Relational DBMS Using tables to keep records Most popular DBMS Properties of relational databases Represents data in forms of tables Rows must be uniquely identifiable (using primary keys, Oracle does not enforce it) No duplicate column names in a table (can be in different tables) Order of rows is insignificant (Order by to order rows when retrieving) Order of columns is insignificant (changing order should not affect programs) Values are atomic (No multi-valued attributes or no repeating groups) 1 No hard-coded relationships between tables (a relationship can be established by joins at any time. Foreign keys enforce referential integrity but not required for join) Hidden physical implementation (users specify what’s in the db but not how to store – DBMS does.) Use system tables so the users can access data dictionary Use SQL to access data (set-oriented) Support null values Major Players ----------------------ORACLE INGRES DB2 INFORMIX SYBASE SQL SERVER Company ------------ORACLE INGRES IBM INFOMIX SYBASE MICROSOFT Database Vendor's 1994 Revenues (in millions). Source: IDC, 1995 $320 $397 $506 $2,514 Informix CA Sybase $1,158 $1,482 Oracle IBM Others Gartner Group predicts that by the year 2000, the “Big Five” database vendors will be IBM, Informix, Microsoft, Oracle, and Sybase. Data Models Address two major issues: structure and operations 2 Hierarchical 1970s IMS ( Information Management System) from IBM A child can have one parent. Have to access parent and work down to the child. Network 1971 CA-IDMS/DB from Computer Associates International Inc. Introduced to overcome Hierarchical model’s limitations. An extended version of Hierarchical model. A child can have more than one parent. Requires complicated coding to manipulate data. Relational Comparison of Relational data model and the other two Relational Hierarchical/Network Strong mathematical foundations (Set Theory) No Theory – outgrowth of practice Set-Oriented Record-Oriented Logical pointers (implicit) Actual physical pointers (explicit) More user-friendly Need special knowledge of the DB structure Uses tables, rows, columns… Uses trees/graph Strong data independence More difficult to change anything SQL (Structured Query Language) SQL is a standard interactive and programming language for getting information from and making changes to a database.(DDL, DML, DCL) Although SQL is both an ANSI and an ISO standard, many database products have their own flavors by extending SQL. For example, PL/SQL. But, the basic operations are very standard. Users (1) End users Naive users: access Database via form, menu driven application ( which are written by application programmers ). Casual end users : DB knowledge needed (e.g. middle or high level managers) Sophisticated end user: Engineers, business analysts, familiar with DBMS. (2) Application programmers Who write the menu (form driven) applications used by naive users. They use host language ( C, C++, Pascal, COBOL, PL/I, VB, PowerBuilder ) and sub language for database query (SQL). Embedded SQL: execute SQL statements within a program written in a regular language, for example, C . (Pro*C ) PL/SQL procedural language extension of SQL. (3) DBA: database administrator Functions: Schema definition, storage structure Security ( access control, set up accounts, roles ) Monitor and fine tune performance Backup and recovery Integrity constraints 3 Benefits of Database Approach Before the advent of DBMS, typically permanent records (information) are stored in various files, and different application programs are written to extract records from and to add records to the appropriate files. Comparing to the file processing system, the DBMS has the following advantages: (1) To provide an integrated collection of data available to a variety of users. Integrated: non-redundancy (no repetition) or controlled-redundancy (2) Consistency: Redundancy can cause inconsistency: When there is duplicated storage, you have to update ALL the records. (3) Integrity - Accuracy Violated because of Human being’s error Redundancy Program bugs Machine failure (may cause incomplete transactions ) An import data accuracy issue is transaction control. In order to ensure data integrity and accuracy in case of system failure, atomicity of a transaction is required - either complete it all or nothing (no partial transaction) Transaction: A logical unit of work performed by a collection of DML statements. DML being insert, update, delete…for example, Transfer $1000 from insured money market to checking account Insured money market Checking -------------------------------------------5000.00 300.00 1. read I = 5000 2. I = I -1000 3. write back I = 4000 4. read C = 300.00 5. C = C + 1000 6. write C = 1300 If the system fails at time 4, what happens to the customers account? Solution for preventing this kind atomicity anomaly: commit - makes all pending changes permanent rollback – discard all pending changes savepoint – Allows a rollback to the savepoint marker Oracle implements an implicit savepoint for each DML statement and if that statement fails, it automatically rolls back to last successful DML statement. But Oracle doesn’t know what’s a 4 transaction from business point of view. So in order to control transactions to ensure atomicity of transaction, you need to explicitly issue commit or rollback commands at the end of a transaction as you define it. A transaction begins when the first executable SQL statement is executed. It ends when one of the following occurs: • Commit or rollback; • DDL or DCL statements execute (automatic commits) • User exits • System crashes Before a commit, data changes are stored in a buffer and only the user who makes the changes is able to see the changes, other users won’t see the changes until a commit is issued, in which case the changes are made permanent. Rollback will undo the changes up to the last commit. Or rollback to savepoint A to undo the changes to where A is. (4) Security Control Not every user should be able to access all the data ( e.g. payroll, salary ) . Database security can be classified into 2 categories: system security and data security. System security covers access and use of the database at the system level, such as username and password, disk space allocated to users. Database security covers access and use of the database objects and actions users can have on the objects. They are controlled by 2 types of privileges: System privileges (Access to the database, ex: create users, create roles, remove users and roles, backup tables, remove tables. DBA who has high level privileges can grant users specific system privileges like create table, create views, create procedures) and Object privileges (Ability to manipulate the content of the database objects such as Insert, select, delete, update tables and views, execute procedures…) Security control makes sure the right people access and manipulate the right data. (5) Concurrent access control Data can be shared. This means destructive interaction between concurrent transactions is possible. For example, the following case should be prevented. E.g. Husband withdraws $300 from checking account at branch A, wife withdraws $150 from the same account from another branch B. at branch A at branch B ------------------------------------before transaction C = 1000.00 read account C = 1000.00 same time read account C = 1000.00 5 withdraw $300.00 C = C -300 = 700 write back withdraw $150.00 C = C -100 = 850 write back The final result: they have $850 in the account after withdrawing $450.00. DBMS guarantees that data can be read but not written at the same time. This is accomplished using locking mechanism. With Oracle, locking is done automatically, and Oracle automatically uses the lowest level of locking to provide highest degree of concurrency yet also providing maximum data integrity. Locks are held for the duration of the transaction. (released when commit or rollback is issued). Basic locking modes Exclusive lock – nobody else can Share Implicit locking – when insert, update, delete…(Oracle does it automatically) Explicit locking – user can obtain a lock by explicitly issuing the select for update statement to obtain an exclusive lock. Levels of locking: Row-level locking (Insert, update, delete) Table-level locking (DDL operations) (6) Data independence ( last section) Disadvantages of using DBMS (1) High cost of DBMS software product (2) High cost of maintenance (3) Highly complexity The three level architecture of DBMS When we compare relational data model with the other two data models, one of the advantages of relational data model is strong data independence. Data independence is achieved through the three-level architecture. Three-levels of the architecture ( ANSI/SPARC)(American National Standard Institute/System Planning and Requirements Committee) Purpose: to provide data independence, separate the three levels . 6 To separate the way the database is physically represented from the way the users think about it. In 1970, E.F. Codd invented relational data model. Part of the abstract to “A Relational Model of Data for Large Shared Data Banks”(Communications of ACM, 6,1970, reprinted in Readings In Database Systems, 1998) talks about data independence: “Future users of large data banks must be protected from having to know how the data is organized in the machine (the internal presentation)…. Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed.” External level: concerned with the way the data is viewed by the individual users. Internal level : concerned with the way the data is physically stored. Conceptual level: level of indirection between the other two. Schema: collection of data structure. e.g.: Employee ( empid, name, DOB, salary, ...) External level - logical view of individual users Closest to the users, it depends on individual user. Users : end users ( naive and sophisticated) A user's view is only part of the entire DB. Conceptual level - entire information defined in this level Describe what data are stored in the database and the relationships that exist among the data ( entire database is described in terms of a small numbers of relative simple structures ) . Internal level - closest to physical storage Describe how the data is actually stored ( complex, low level data structures are described in details ) Without conceptual level: any changes in either side would require the corresponding changes in the other side External level Internal level Data Independence 7 Physical structure The way data values are stored and organized in the system Logical structure The user's view of the data Data independence (1) Physical data independence Physical structures of DB can be changed without requiring a rewrite of application programs (2) Logical data independence Logical structure of the database can be changed without affecting the application programs (or : the ability to modify the conceptual schema without causing application program to be rewritten.) Modification at conceptual level is necessary whenever the logical structure of the database is altered 8