Download Data Independence

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Operational transformation wikipedia , lookup

Data center wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

SQL wikipedia , lookup

Data analysis wikipedia , lookup

PL/SQL wikipedia , lookup

Data model wikipedia , lookup

Expense and cost recovery system (ECRS) wikipedia , lookup

Versant Object Database wikipedia , lookup

Information privacy law wikipedia , lookup

Concurrency control wikipedia , lookup

SAP IQ wikipedia , lookup

3D optical data storage wikipedia , lookup

Data vault modeling wikipedia , lookup

Database wikipedia , lookup

Open data in the United Kingdom wikipedia , lookup

Clusterpoint wikipedia , lookup

Business intelligence wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Class 1 - Database Overview
(3/30/1999)
Fundamental database Concepts
Data
Data is a gathered body of facts that of interest. It helps you keep records and make
decisions.
Database
A database is a collection of sharable data (or information) for a common purpose. It is
organized so that its contents can easily be accessed, managed and updated.
DBMS (Database Management System)
A program product for keeping computerized records about an enterprise.
A layer of software between users and data storage).
A DBMS consists of a collection of interrelated data and a set of programs to access those
data.
Ex: Oracle, Sybase, Informix…
Users send requests for accessing the database to DBMS and DBMS performs operations on
data as requested
DBMS Functions
 interact with the users and data storage
 security management
 data integrity
 backup and recovery
 concurrency control
 transaction support
 Data dictionary
Relational DBMS
Using tables to keep records
Most popular DBMS
Properties of relational databases
 Represents data in forms of tables
 Rows must be uniquely identifiable (using primary keys, Oracle does not enforce it)
 No duplicate column names in a table (can be in different tables)
 Order of rows is insignificant (Order by to order rows when retrieving)
 Order of columns is insignificant (changing order should not affect programs)
 Values are atomic (No multi-valued attributes or no repeating groups)
1
 No hard-coded relationships between tables (a relationship can be established by joins at any
time. Foreign keys enforce referential integrity but not required for join)
 Hidden physical implementation (users specify what’s in the db but not how to store – DBMS
does.)
 Use system tables so the users can access data dictionary
 Use SQL to access data (set-oriented)
 Support null values
Major Players
----------------------ORACLE
INGRES
DB2
INFORMIX
SYBASE
SQL SERVER
Company
------------ORACLE
INGRES
IBM
INFOMIX
SYBASE
MICROSOFT
Database Vendor's 1994 Revenues (in
millions). Source: IDC, 1995
$320
$397
$506
$2,514
Informix
CA
Sybase
$1,158
$1,482
Oracle
IBM
Others
Gartner Group predicts that by the year 2000, the “Big Five” database vendors will be IBM,
Informix, Microsoft, Oracle, and Sybase.
Data Models
Address two major issues: structure and operations
2
Hierarchical
1970s IMS ( Information Management System) from IBM
A child can have one parent. Have to access parent and work down to the child.
Network
1971 CA-IDMS/DB from Computer Associates International Inc.
Introduced to overcome Hierarchical model’s limitations. An extended version of Hierarchical
model.
A child can have more than one parent. Requires complicated coding to manipulate data.
Relational
Comparison of Relational data model and the other two
Relational
Hierarchical/Network
Strong mathematical foundations (Set Theory) No Theory – outgrowth of practice
Set-Oriented
Record-Oriented
Logical pointers (implicit)
Actual physical pointers (explicit)
More user-friendly
Need special knowledge of the DB structure
Uses tables, rows, columns…
Uses trees/graph
Strong data independence
More difficult to change anything
SQL (Structured Query Language)
SQL is a standard interactive and programming language for getting information from and
making changes to a database.(DDL, DML, DCL) Although SQL is both an ANSI and an ISO
standard, many database products have their own flavors by extending SQL. For example,
PL/SQL. But, the basic operations are very standard.
Users
(1) End users
Naive users: access Database via form, menu driven application ( which are
written by application programmers ).
Casual end users : DB knowledge needed (e.g. middle or high level managers)
Sophisticated end user: Engineers, business analysts, familiar with DBMS.
(2) Application programmers
Who write the menu (form driven) applications used by naive users.
They use host language ( C, C++, Pascal, COBOL, PL/I, VB, PowerBuilder ) and
sub language for database query (SQL).
Embedded SQL: execute SQL statements within a program written in a regular
language, for example, C . (Pro*C )
PL/SQL procedural language extension of SQL.
(3) DBA: database administrator
Functions:
Schema definition, storage structure
Security ( access control, set up accounts, roles )
Monitor and fine tune performance
Backup and recovery
Integrity constraints
3
Benefits of Database Approach
Before the advent of DBMS, typically permanent records (information) are stored in various
files, and different application programs are written to extract records from and to add
records to the appropriate files.
Comparing to the file processing system, the DBMS has the following advantages:
(1) To provide an integrated collection of data available to a variety of users.
Integrated: non-redundancy (no repetition) or controlled-redundancy
(2) Consistency:
Redundancy can cause inconsistency:
When there is duplicated storage, you have to update ALL the records.
(3) Integrity - Accuracy
Violated because of
Human being’s error
Redundancy
Program bugs
Machine failure (may cause incomplete transactions )
An import data accuracy issue is transaction control. In order to ensure data integrity
and accuracy in case of system failure, atomicity of a transaction is required - either complete it
all or nothing (no partial transaction)
Transaction: A logical unit of work performed by a collection of DML statements.
DML being insert, update, delete…for example,
Transfer $1000 from insured money market to checking account
Insured money market
Checking
-------------------------------------------5000.00
300.00
1.
read
I = 5000
2.
I = I -1000
3.
write back I = 4000
4.
read C = 300.00
5.
C = C + 1000
6.
write C = 1300
If the system fails at time 4, what happens to the customers account?
Solution for preventing this kind atomicity anomaly:
commit - makes all pending changes permanent
rollback – discard all pending changes
savepoint – Allows a rollback to the savepoint marker
Oracle implements an implicit savepoint for each DML statement and if that statement fails, it
automatically rolls back to last successful DML statement. But Oracle doesn’t know what’s a
4
transaction from business point of view. So in order to control transactions to ensure atomicity
of transaction, you need to explicitly issue commit or rollback commands at the end of a
transaction as you define it.
A transaction begins when the first executable SQL statement is executed. It ends when one of
the following occurs:
• Commit or rollback;
• DDL or DCL statements execute (automatic commits)
• User exits
• System crashes
Before a commit, data changes are stored in a buffer and only the user who makes the changes is
able to see the changes, other users won’t see the changes until a commit is issued, in which case
the changes are made permanent. Rollback will undo the changes up to the last commit. Or
rollback to savepoint A to undo the changes to where A is.
(4) Security Control
Not every user should be able to access all the data ( e.g. payroll, salary ) .
Database security can be classified into 2 categories: system security and data security.
System security covers access and use of the database at the system level, such as username
and password, disk space allocated to users. Database security covers access and use of the
database objects and actions users can have on the objects.
They are controlled by 2 types of privileges: System privileges (Access to the database, ex:
create users, create roles, remove users and roles, backup tables, remove tables. DBA who
has high level privileges can grant users specific system privileges like create table, create
views, create procedures) and Object privileges (Ability to manipulate the content of the
database objects such as Insert, select, delete, update tables and views, execute procedures…)
Security control makes sure the right people access and manipulate the right data.
(5) Concurrent access control
Data can be shared. This means destructive interaction between concurrent
transactions is possible. For example, the following case should be prevented.
E.g. Husband withdraws $300 from checking account at branch A, wife withdraws
$150 from the same account from another branch B.
at branch A
at branch B
------------------------------------before transaction
C = 1000.00
read account
C = 1000.00
same time
read account
C = 1000.00
5
withdraw $300.00
C = C -300 = 700
write back
withdraw $150.00
C = C -100 = 850
write back
The final result: they have $850 in the account
after withdrawing $450.00.
DBMS guarantees that data can be read but not written at the same time.
This is accomplished using locking mechanism. With Oracle, locking is done automatically,
and Oracle automatically uses the lowest level of locking to provide highest degree of
concurrency yet also providing maximum data integrity. Locks are held for the duration of
the transaction. (released when commit or rollback is issued).
Basic locking modes
Exclusive lock – nobody else can
Share
Implicit locking – when insert, update, delete…(Oracle does it automatically)
Explicit locking – user can obtain a lock by explicitly issuing the select for update statement to
obtain an exclusive lock.
Levels of locking:
Row-level locking (Insert, update, delete)
Table-level locking (DDL operations)
(6) Data independence ( last section)
Disadvantages of using DBMS
(1) High cost of DBMS software product
(2) High cost of maintenance
(3) Highly complexity
The three level architecture of DBMS
When we compare relational data model with the other two data models, one of the
advantages of relational data model is strong data independence. Data independence is achieved
through the three-level architecture.
Three-levels of the architecture ( ANSI/SPARC)(American National Standard
Institute/System Planning and Requirements Committee)
Purpose: to provide data independence, separate the three levels .
6
To separate the way the database is physically represented from the way the users think about
it.
In 1970, E.F. Codd invented relational data model.
Part of the abstract to “A Relational Model of Data for Large Shared Data
Banks”(Communications of ACM, 6,1970, reprinted in Readings In Database Systems, 1998)
talks about data independence:
“Future users of large data banks must be protected from having to know how the data is
organized in the machine (the internal presentation)…. Activities of users at terminals and
most application programs should remain unaffected when the internal representation of
data is changed and even when some aspects of the external representation are changed.”
External level: concerned with the way the data is viewed by the individual users.
Internal level : concerned with the way the data is physically stored.
Conceptual level: level of indirection between the other two.
Schema: collection of data structure. e.g.: Employee ( empid, name, DOB, salary, ...)
External level - logical view of individual users
Closest to the users, it depends on individual user.
Users : end users ( naive and sophisticated)
A user's view is only part of the entire DB.
Conceptual level - entire information defined in this level
Describe what data are stored in the database and the relationships that exist among
the data ( entire database is described in terms of a small numbers of relative
simple structures ) .
Internal level - closest to physical storage
Describe how the data is actually stored
( complex, low level data structures are described in details )
Without conceptual level:
any changes in either side would require the
corresponding changes in the other side
External level
Internal level
Data Independence
7


Physical structure
The way data values are stored and organized in the system
Logical structure
The user's view of the data
Data independence
(1) Physical data independence
Physical structures of DB can be changed without requiring a rewrite of application
programs
(2) Logical data independence
Logical structure of the database can be changed without affecting the application
programs (or : the ability to modify the conceptual schema without causing
application program to be rewritten.)
Modification at conceptual level is necessary whenever the logical structure of the
database is altered
8