Download ch1_intro_csi3317-05..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Oracle Database wikipedia , lookup

Serializability wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Clusterpoint wikipedia , lookup

Database wikipedia , lookup

Concurrency control wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Database Management Systems
Chapter 1
Professor: Iluju Kiringa
[email protected]
http://www.site.uottawa.ca/~kiringa
SITE 5072
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
1
What Is a DBMS?


A database is a very large, integrated collection of
data.
Organizations are faced with HUGE amounts of data
that needs to be efficiently managed.
 Many enterprises deal with GBs, even TBs of data
 Some scientific applications deal with PBs of data

Models real-world organization / enterprise.



Entities (e.g., students, courses, faculty, and classrooms)
Relationships (e.g., Sue is enrolled in CSI3317; Iluju teaches
CSI3317, CSI3317 is taught in TBT070)
A Database Management System (DBMS) is a software
package designed to store and manage databases.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
2
Drawbacks of Files Systems
Each application must move large datasets
between main memory and secondary storage
(must deal with, e.g., buffering, page-oriented
access, etc.)
 Each application must deal with some method
of identifying all data items in case the
available addressing mode is not sufficient
(e.g., 32-bit addressing cannot directly access
more than 4GB.)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
3
Drawbacks of Files Systems (Cont’d)
Need special code for different queries (i.e. all
query codes are ad hoc).
 Must protect data from inconsistency due to
multiple concurrent users changing it.
 Must ensure consistent crash recovery.
 Must provide more security and access control
than the password mechanism offered by
operating systems.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
4
Why Use a DBMS?






Data independence: application don’t see details of
data representation and storage.
Efficient access: use of very sophisticated data storage
and access methods.
Reduced application development time:
functionalities of a DBMS need not be duplicated.
Data integrity and security: enforcing integrity
constraints and access control.
Uniform data administration: experienced users
administer data that is used by inexperienced ones.
Concurrent access, recovery from crashes: users
access data without thinking of whoever else uses it.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
5
Why Study Databases??

?
Datasets increasing in diversity and volume


Digital libraries, interactive video, Human
Genome project, EOS project, Mars exploration
projects, …
... need for DBMS exploding
Efficient management of such a diverse
amount of data involves research in many
fundamental issues.
 Those issues are common to all the domains.
 DBMS encompasses most of CS


OS, languages, theory, AI, multimedia, logic, etc.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
6
Data Models




A data model is a collection of high-level
constructs for describing data.
A schema is a description of a particular collection
of data, using a given data model.
An instance of a schema is a sample set of data
organized using a given schema.
The relational model is the most widely used
model today.


Main concept: relation instance (relation), i.e. a
table with rows and columns.
Every relation has a relation schema (schema),
which describes the name and columns, or fields.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
7
Levels of Abstraction


Data schemas are given at
3 levels of abstractions.
Many views, single
conceptual (logical) schema
and physical schema.



View 1
View 2
View 3
Conceptual Schema
Views (or external schemas)
describe how users see the
data.
Conceptual schema defines
logical structure
Physical schema describes
the files and indexes used.
Physical Schema
* Schemas are defined using a data definition language (DDL).
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
8
Example: Enterprise Database

Conceptual schema:




Physical schema:



Emp(eid: integer, ename: string, age: integer, sal: real)
Works_in(eid: integer, did: integer, rating: real)
Dept(did: integer, budget: real, mgr: integer)
Relations stored as ordered files.
Index on first column of Emp, and Dept; index on
all columns of Works_in, etc.
External Schema (View):

Dept_count(did: integer, count: integer)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
9
Example: University Database

Conceptual schema:




Physical schema:



Students(sid: string, name: string, login: string,
age: integer, gpa: real)
Courses(cid: string, cname: string, credits: integer)
Enrolled(sid: string, cid: string, grade: string)
Relations stored as unordered files.
Index on first column of Students, and Courses; index
on two first columns of Enrolled, etc.
External Schema (View):

Course_info(cid: string, enrollment: integer)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
10
Data Independence *
Applications insulated from how data is
structured and stored.
 Each level of abstraction protected from
changes in the structure of the level below it.
 Logical data independence: Protection from
changes in logical structure of data.
 Physical data independence: Protection from
changes in physical structure of data.

* One of the most important benefits of using a DBMS!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
11
Querying Data

A query is a question involving the stored data, e.g.,





What is the name of the professor who teaches CSI3317?
Which percentage of students got an A+ in CSI3317?
A query language is a special purpose language in
which queries can be posed against databases.
Data is modified/queried using a data manipulation
language (DML). So a query language is a subset of a
DML
The relational model supports 2 query languages:


Relational calculus: logic-based query language
Relational algebra: based on a set of operators for
manipulating relations.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
12
Concurrency Control

Concurrent execution of user programs is
essential for good DBMS performance.

Because disk accesses are frequent, and relatively slow,
it is important to keep the cpu working by processing
several user programs concurrently.
Interleaving actions of different user programs
can lead to inconsistency: e.g., depositing money
while account balance is being computed.
 DBMS ensures such problems don’t arise: users
can pretend they are using a single-user system.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
13
Transaction: An Execution of a DB Program


A transaction is an atomic sequence of database actions
(reads/writes) corresponding to the execution of a
DB transaction program.
Each transaction, executed completely, must leave
the DB in a consistent state if DB is consistent when
the transaction begins.



Users can specify some simple integrity constraints on the
data, and the DBMS will enforce these constraints.
Beyond this, the DBMS does not really understand the
semantics of the data.
Thus, ensuring that a transaction (run alone) preserves
consistency is ultimately the user’s responsibility!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
14
Scheduling Concurrent Transactions

A DBMS ensures that execution of {T1, ... , Tn} is
equivalent to some serial execution of T1, ..., Tn.



Before reading/writing an object, a transaction requests
a lock on the object, and waits till the DBMS gives it the
lock. All locks are released at the end of the transaction.
(Strict 2PL locking protocol.)
Idea: If an action of Ti (say, writing X) affects Tj (which
perhaps reads X), one of them, say Ti, will obtain the
lock on X first and Tj is forced to wait until Ti completes;
this effectively orders the transactions.
If Tj already has a lock on Y and Ti later requests a lock
on Y, there is Deadlock! Ti or Tj is aborted and restarted!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
15
Ensuring Atomicity
A DBMS ensures atomicity (all-or-nothing
property) even if there is a crash in the middle of a
transaction.
 Idea: Keep a log (history) of all actions carried out
by the DBMS while executing a set of transactions:



Before a change is made to the database, the
corresponding log entry is forced to a safe location.
(WAL protocol; OS support for this is often inadequate.)
After a crash, the effects of partially executed
transactions are undone using the log. (Thanks to WAL, if
log entry wasn’t saved before the crash, corresponding
change was not applied to database!)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
16
The Log

The following actions are recorded in the log:

Ti writes an object: the old value and the new value.
• Log record must go to disk before the changed page!

Ti commits/aborts: a log record indicating this action.
Log records are chained together by transaction id, so
it’s easy to undo a specific transaction.
 Log is often archived on “stable” storage.
 All log related activities (and in fact, all CC related
activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
17
People Involved with Databases




End users: just use an interface to a DBMS!
DBMS vendors: IBM, Oracle, Informix, Microsoft, etc
DBMS researchers and implementers: invent new
theories and algorithms for, and write code of DBMSs
DB application programmers: write C/C++/Java/…
programs that interact with DBMSs


E.g. smart webmasters, CSI3317 students, etc
Database administrator (DBA)




Designs logical /physical schemas
Handles security and authorization
Data availability, crash recovery
Database tuning as needs evolve
DBAs must understand how a DBMS works!
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
18
These layers
must consider
concurrency
control and
recovery
Structure of a DBMS



A typical DBMS has a
Query Optimization
layered architecture.
and Execution
The figure does not
Relational Operators
show the concurrency
Files and Access Methods
control and recovery
components.
Buffer Management
This is one of several
Disk Space Management
possible architectures;
each system has its own
variations.
DB
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
19
Historical Perspective





Early 1960s: GE designed 1st general-purpose DBMS,
based on the network data model.
Late 1960s: IBM developed IMS, based on the
hierarchical data model.
Early 1970s: E. Codd introduced the relational model
to solve problems related to the previous models.
Late 1970s and early 1980s: work on transaction
processing, mainly by Jim Gray and P. Bernstein,
All the 1980s: use of relational DBs became standard
practice in large corporations, many vendors entered
the market, standardization efforts on SQL, the query
language for relational DBs, etc.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
20
Historical Perspective (Cont’d)



Late 1980’s and early 1990s: richer data models (objectoriented, object-relational, etc), and more expressive
query languages (Datalog, nested relations, etc) are
introduced.
Late 1990s: major vendors extend relational DBMSs
towards new data types (images, text, multimedia
content) and queries based hereon.
All along the way:
 Scientific recognition: Turing Awards to DB
researchers E. Codd and J. Gray
 Birth of a huge, multibillion industry (IBM, Oracle …)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
21
Summary







DBMS used to maintain, query large datasets.
Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DB researchers, implementers, and administrators
hold responsible jobs and are well-paid!
DBMS R&D is one of the broadest,
most exciting areas in CS.
Databases are a multibillion industry !
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke
22