Download What Is a Database?

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Microsoft Access wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Serializability wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Oracle Database wikipedia , lookup

IMDb wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Ingres (database) wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Relational model wikipedia , lookup

Clusterpoint wikipedia , lookup

ContactPoint wikipedia , lookup

Database model wikipedia , lookup

Transcript
Chapter 1: Introduction and Basic
concepts ( [S] chp. 1)
•
•
•
•
•
•
•
•
•
•
Purpose of Database Systems
View of Data
Data Models
Data Definition Language
Data Manipulation Language
Transaction Management
Storage Management
Database Administrator
Database Users
Overall System Structure
Database System Concepts
1.1
©Silberschatz, Korth and Sudarshan
• A database represents some aspect of the real world, sometimes called the
mini-world or the Universe of Discourse (UoD).
• A database is a logically coherent collection of data with some inherit
meaning.
A random assortment of data cannot correctly be referred to as a database.
• A database is designed, built, and populated with data for a specific
purpose. It has an intended group of users and some preconceived
applications in which these users are interested
Database System Concepts
1.2
©Silberschatz, Korth and Sudarshan
What Is a Database?
•
•
•
A very large, integrated collection of data.
Models real-world enterprise.

Entities (e.g., students, courses)

Relationships (e.g., Madonna is taking CS564)
A Database Management System (DBMS) is a software
package designed to store and manage databases.
Database System Concepts
1.3
©Silberschatz, Korth and Sudarshan
Database Management System (DBMS)
 Collection of interrelated data
 Set of programs to access the data
 DBMS provides an environment that is both convenient and
efficient to use.
 Database Applications:
• Banking: all transactions
•
•
•
•
•
Airlines: reservations, schedules
Universities: registration, grades
Sales: customers, products, purchases
Manufacturing: production, inventory, orders, supply chain
Human resources: employee records, salaries, tax deductions
 Databases touch all aspects of our lives
Database System Concepts
1.4
©Silberschatz, Korth and Sudarshan
Purpose of Database System
•
•
In the early days, database applications were built on top of
file systems
Drawbacks of using file systems to store data:
 Data redundancy and inconsistency
 Multiple file formats, duplication of information in different files
 Difficulty in accessing data
 Need to write a new program to carry out each new task
 Data isolation — multiple files and formats
 Integrity problems
 Integrity constraints (e.g. account balance > 0) become part
of program code
 Hard to add new constraints or change existing ones
Database System Concepts
1.5
©Silberschatz, Korth and Sudarshan
Purpose of Database Systems (Cont.)
•
Drawbacks of using file systems (cont.)
 Atomicity of updates
 Failures may leave database in an inconsistent state with partial
updates carried out
 E.g. transfer of funds from one account to another should either
complete or not happen at all
 Concurrent access by multiple users
 Concurrent accessed needed for performance
 Uncontrolled concurrent accesses can lead to inconsistencies
 E.g. two people reading a balance and updating it at the same
time
•
 Security problems
Database systems offer solutions to all the above problems
Database System Concepts
1.6
©Silberschatz, Korth and Sudarshan
Why Use a DBMS?
•
•
•
•
•
•
•
•
Separation of the Data definition and the Program
Abstraction into a simple model
Data independence and efficient access.
Reduced application development time – ad-hoc queries
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from crashes.
Support for multiple different views
Database System Concepts
1.7
©Silberschatz, Korth and Sudarshan
Why Study Databases??
•
•
•
?
Shift from computation to information

at the “low end”: scramble to webspace (a mess!)

at the “high end”: scientific applications
Datasets increasing in diversity and volume.

Digital libraries, interactive video, Human Genome project, EOS
project

... need for DBMS exploding
DBMS encompasses most of CS

OS, languages, theory, “AI”, multimedia, logic
Database System Concepts
1.8
©Silberschatz, Korth and Sudarshan
Levels of Abstraction
•
Many views, single conceptual
(logical) schema and physical
schema.
View 1

Views describe how users see the
data.

Conceptual schema defines logical
structure. Sometime we separate
between conceptual level and
logical level

Physical schema describes the
files and indexes used.
View 2
View 3
Conceptual Schema
Physical Schema
* Schemas are defined using DDL (Data Definition Language)
*data is modified/queried using DML (Data Manipulation Language)
Database System Concepts
1.9
©Silberschatz, Korth and Sudarshan
Levels of Abstraction
•
•
•
Physical level describes how a record (e.g., customer) is stored.
Logical level: describes data stored in database, and the
relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
View level: application programs hide details of data types.
Views can also hide information (e.g., salary) for security
purposes.
Database System Concepts
1.10
©Silberschatz, Korth and Sudarshan
Instances and Schemas
•
•
Similar to types and variables in programming languages
Schema – the logical structure of the database
 e.g., the database consists of information about a set of customers and
accounts and the relationship between them)
 Analogous to type information of a variable in a program
 Physical schema: database design at the physical level
•
 Logical schema: database design at the logical level
Instance – the actual content of the database at a particular point
in time
 Analogous to the value of a variable
Database System Concepts
1.11
©Silberschatz, Korth and Sudarshan
Database System Concepts
1.12
©Silberschatz, Korth and Sudarshan
student name
smith
brown
student number
17
8
class
1
2
course courseName
major
cosc
cosc
coursenumber Cradit hours depertment
cosc1310
4
cosc
cosc3320
4
cosc
math2410
3
math
cosc3380
3
cosc
intro to com duts s cie nce
data s tructure s
dis
database
prerequisite
coursenumber
cosc3380
cosc3330
cosc3320
section sectionldentifier
85
92
102
112
119
135
grade_report student number
17
17
8
8
8
8
Database System Concepts
sectionldentifier
112
119
85
92
102
135
rerequisite
number
cosc3320
math2410
cosc1310
coursenumber
math2410
cosc1310
cosc3320
math2410
cosc1310
cosc3380
semester
fall
fall
spring
fall
fall
fall
year
86
86
87
87
87
87
instructor
king
anderson
kuuth
chang
anderson
stone
grade
B
C
A
A
B
A
1.13
©Silberschatz, Korth and Sudarshan
Data Models
•
•
•
•
A collection of modeling tools for describing




data
data relationships
data semantics
data constraints
Entity-Relationship model
Relational model
Other models:
 object-oriented model
 semi-structured data models (XML)
 Older models: network model and hierarchical model
Database System Concepts
1.14
©Silberschatz, Korth and Sudarshan
Entity-Relationship Model
Example of schema in the entity-relationship model
Database System Concepts
1.15
©Silberschatz, Korth and Sudarshan
Entity Relationship Model (Cont.)
•
E-R model of real world
 Entities (objects)
 E.g. customers, accounts, bank branch
 Relationships between entities
 E.g. Account A-101 is held by customer Johnson
•
 Relationship set depositor associates customers with accounts
Widely used for database design
 Database design in E-R model usually converted to design in the
relational model (coming up later) which is used for storage and
processing
Database System Concepts
1.16
©Silberschatz, Korth and Sudarshan
Relational Model
•
Attributes
Example of tabular data in the relational model
Customerid
customername
192-83-7465
Johnson
019-28-3746
Smith
192-83-7465
Johnson
321-12-3123
Jones
019-28-3746
Smith
Database System Concepts
customerstreet
customercity
accountnumber
Alma
Palo Alto
A-101
North
Rye
A-215
Alma
Palo Alto
A-201
Main
Harrison
A-217
North
Rye
A-201
1.17
©Silberschatz, Korth and Sudarshan
A Sample Relational Database
Database System Concepts
1.18
©Silberschatz, Korth and Sudarshan
Physical (Storage) schema decisions
•
Mapping of entities to files (OS files)
•
Data representation and encoding (compression)
•
Access methods (Direct, Hashing, Indexed)
•
Which indexes to maintain
•
Clustering of records
•
OS/DBMS issues (buffer management)
Database System Concepts
1.19
©Silberschatz, Korth and Sudarshan
External (View) schema decisions
•
Which entities to present/filter
•
Data representation and encoding (compression)
•
Programming language dependent issues
•
Changes to names, order of attributes
•
Derived (computed) fields and joined tables
Database System Concepts
1.20
©Silberschatz, Korth and Sudarshan
student name
smith
brown
student number
17
8
class
1
2
course courseName
major
cosc
cosc
coursenumber Cradit hours depertment
cosc1310
4
cosc
cosc3320
4
cosc
math2410
3
math
cosc3380
3
cosc
intro to com duts s cie nce
data s tructure s
dis
database
prerequisite
coursenumber
cosc3380
cosc3330
cosc3320
section sectionldentifier
85
92
102
112
119
135
grade_report student number
17
17
8
8
8
8
Database System Concepts
sectionldentifier
112
119
85
92
102
135
rerequisite
number
cosc3320
math2410
cosc1310
coursenumber
math2410
cosc1310
cosc3320
math2410
cosc1310
cosc3380
semester
fall
fall
spring
fall
fall
fall
year
86
86
87
87
87
87
instructor
king
anderson
kuuth
chang
anderson
stone
grade
B
C
A
A
B
A
1.21
©Silberschatz, Korth and Sudarshan
(*) Not relational…
Database System Concepts
1.22
©Silberschatz, Korth and Sudarshan
Data Independence
•
Physical Data Independence – the ability to modify the physical
schema without changing the application programs
 Applications depend on the logical schema
 DBA may change physical level (tuning) without affecting
applications
•
 The DBMS automatically make the required adjustments, and
application programs are not changed (queries may need to be
recompiled and optimized…)
Logical Data Independence – the ability to modify the logical
schema without changing the application programs
 Applications depend on the logical schema via the Views
 Can be supported on a limited basis only (if view is not affected)
Database System Concepts
1.23
©Silberschatz, Korth and Sudarshan
Data Definition Language (DDL)
•
•
•
Specification notation for defining the database schema
 E.g.
create table account (
account-number
balance
char(10),
integer)
DDL compiler generates a set of tables stored in a data
dictionary
Data dictionary contains metadata (i.e., data about data)

database schema
 Data storage and definition language
 language in which the storage structure and access methods
used by the database system are specified
 Usually an extension of the data definition language
Database System Concepts
1.24
©Silberschatz, Korth and Sudarshan
Data Manipulation Language (DML)
•
•
Language for accessing and manipulating the data organized by
the appropriate data model
 A declarative DML is also known as query language
Two classes of languages
 Procedural – user specifies what data is required and how to get
those data (DML)
•
 Nonprocedural – user specifies what data is required without
specifying how to get those data (Query language)
SQL is the most widely used query language
Database System Concepts
1.25
©Silberschatz, Korth and Sudarshan
SQL
•
SQL: widely used non-procedural language
 E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-name
from customer
where customer.customer-id = ‘192-83-7465’
•
 E.g. find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
Application programs generally access databases through one of
 Language extensions to allow embedded SQL
 Application program interface (e.g. ODBC/JDBC) which allow SQL
queries to be sent to a database
Database System Concepts
1.26
©Silberschatz, Korth and Sudarshan
Database Users
•
•
•
•
•
Users are differentiated by the way they expect to interact with
the system
Application programmers – interact with system through DML
calls
Sophisticated users – form requests in a database query
language
Specialized users – write specialized database applications that
do not fit into the traditional data processing framework
Naïve users – invoke one of the permanent application programs
that have been written previously
 E.g. people accessing database over the web, bank tellers, clerical
staff
Database System Concepts
1.27
©Silberschatz, Korth and Sudarshan
Database Administrator
•
•
Coordinates all the activities of the database system; the
database administrator has a good understanding of the
enterprise’s information resources and needs.
Database administrator's duties include:




Schema definition
Storage structure and access method definition
Schema and physical organization modification
Granting user authority to access the database
 Specifying integrity constraints
 Acting as liaison with users
 Monitoring performance and responding to changes in
requirements
Database System Concepts
1.28
©Silberschatz, Korth and Sudarshan
Structure of a DBMS
•
•
•
These layers
must consider
concurrency
control and
recovery
A typical DBMS has a layered
architecture.
The figure does not show the
concurrency control and
recovery components.
Query Optimization
and Execution
This is one of several possible
architectures; each system has
its own variations.
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Database System Concepts
1.29
©Silberschatz, Korth and Sudarshan
Transfer money from: account A to: account B
Begin Transaction
CRASH!
SUBTRACT 100 FROM A
ADD
100 TO B
End Transaction
Abort, Commit, Rollback
Database System Concepts
1.30
©Silberschatz, Korth and Sudarshan
READ # SEATS
READ # SEATS
# SEATS = SEATS –1
# SEATS = #SEATS – 1
WRITE # SEATS
WRITE # SEATS
Solution: Two-Phase locking
Database System Concepts
1.31
©Silberschatz, Korth and Sudarshan
Overall System Structure
Database System Concepts
1.32
©Silberschatz, Korth and Sudarshan
Storage Management
•
•
Storage manager is a program module that provides the
interface between the low-level data stored in the database and
the application programs and queries submitted to the system.
The storage manager is responsible to the following tasks:
 interaction with the file manager
 efficient storing, retrieving and updating of data
Database System Concepts
1.33
©Silberschatz, Korth and Sudarshan
Concurrency Control
•
Concurrent execution of user programs
good DBMS performance.

•
•
is essential for
Because disk accesses are frequent, and relatively slow, it is important
to keep the cpu humming by working on several user programs
concurrently.
Interleaving actions of different user programs can lead to
inconsistency: e.g., check is cleared while account balance is
being computed.
DBMS ensures such problems don’t arise: users can pretend they
are using a single-user system.
Database System Concepts
1.34
©Silberschatz, Korth and Sudarshan
Transaction Management
•
•
•
A transaction is a collection of operations that performs a single
logical function in a database application
Transaction-management component ensures that the database
remains in a consistent (correct) state despite system failures
(e.g., power failures and operating system crashes) and
transaction failures.
Concurrency-control manager controls the interaction among the
concurrent transactions, to ensure the consistency of the
database.
Database System Concepts
1.35
©Silberschatz, Korth and Sudarshan
Transaction: An Execution of a DB Program
•
•
Key concept is transaction, which is an atomic sequence of database
actions (reads/writes).
Each transaction, executed completely, must leave the DB in a
consistent state if DB is consistent when the transaction begins.

Users can specify some simple integrity constraints on the data, and the
DBMS will enforce these constraints.

Beyond this, the DBMS does not really understand the semantics of the
data. (e.g., it does not understand how the interest on a bank account is
computed).

Thus, ensuring that a transaction (run alone) preserves consistency is
ultimately the user’s responsibility!
Database System Concepts
1.36
©Silberschatz, Korth and Sudarshan
Scheduling Concurrent Transactions
•
DBMS ensures that execution of {T1, ... , Tn} is equivalent to some
serial execution T1’ ... Tn’.

Before reading/writing an object, a transaction requests a lock on the object,
and waits till the DBMS gives it the lock. All locks are released at the end of
the transaction. (Strict 2PL locking protocol.)

Idea: If an action of Ti (say, writing X) affects Tj (which perhaps reads X),
one of them, say Ti, will obtain the lock on X first and Tj is forced to wait
until Ti completes; this effectively orders the transactions.

What if Tj already has a lock on Y and Ti later requests a lock on Y?
(Deadlock!) Ti or Tj is aborted and restarted!
Database System Concepts
1.37
©Silberschatz, Korth and Sudarshan
The importance of the Data Dictionary
•
•
•
•
Contains all definitions: DDL (logical schema), Views definition,
Physical schema definitions including Indexing and clustering
information, Integrity constraints, security rules, stored
procedures (SQL)
Essential for query parsing and optimization
Contains other important documentation and programs
(regulations, standards, codes, etc.)
There are companies who sell Data Dictionary tools as a
separate product!
Database System Concepts
1.38
©Silberschatz, Korth and Sudarshan
•Logical Design and Data-Dictionary Tools
•Loading
•Physical Design and File reorganization
•Backup / Restore / Recovery
•Performance Monitoring and Tuning
Database System Concepts
1.39
©Silberschatz, Korth and Sudarshan
Application Architectures
Two-tier architecture: E.g. client programs using ODBC/JDBC to
communicate with a database
Three-tier architecture: E.g. web-based applications, and
applications built using “middleware”
Database System Concepts
1.40
©Silberschatz, Korth and Sudarshan
•Hierarchical – Pre-historic – IMS
•Network – Historic –IDMS, ADABAS, lead to Object- Oriented
•RELATIONAL- current – 95% of the market – Oracle, Informix, SQL/
Server, Progress, IBM DB2, etc.
•Object- ORIENTED Current – lot of HuHa but very narrow market,
mainly CAD AND Engineering – Objectivity, Versant, Jasmine
•Object – Relational- Current / Future – SQL3, Informix UDO ,
Oracle-9, IBM DB2.
•XML – not much commercial success as a Database, in-spite of much
research
•Cloud and NOSQL databases
Database System Concepts
1.41
©Silberschatz, Korth and Sudarshan
PRE-1960S
1945-magnetic tapes developed (the first medium to allow searching).
1957- First commercial computer installed.
1959- McGee proposed the notion of generalized access to electronically stored data.
THE 60s
1961- The first generalized DBMS-GEs Integrated Data Store (IDS) designed by Bachman.
THE 70s – database technology experienced rapid growth.
1970- The relational model is developed by Ted Codd, an IBM research fellow.
1971- CODASYL Database Task Group Report.
1975- ACM Special Interest Group on Management of data organized first SIGMOD international
conference.
1976- Entity- relationship (ER)model introduced by chen.
THE 80s- DBMSs developed for personal computers (DBASE, PARADOX, etc).
1983- ANSI/SPARC survey revealed>100 relational systems had been implemented by the beginning of the
80s.
Database System Concepts
1.42
©Silberschatz, Korth and Sudarshan
1985- Preliminary SQL standard published. Business world influenced by “Fourth Generation
Languages”.
*Trends in the ‘80s: extendable database systems:object- oriented DBMSs, client server
architecture for distributed database.
The ’90s
* Demand for extending DBMS capabilities to meet new applications.
* Emergence of commercial object- oriented DBMSs.
* Demand for exploiting massively parallel processors (MPPs).
•Total victory by the relational model
•SQL 3
•Object relational systems.
The ’00s
•The emergence of XML and the integration of XML and Relational databases
•Web databases, Search engines, Semantic web
•Cloud and NOSQL Databases
Database System Concepts
1.43
©Silberschatz, Korth and Sudarshan
Databases make these folks happy ...
•
•
•
End users and DBMS vendors
DB application programmers

E.g. smart webmasters
Database administrator (DBA)

Designs logical /physical schemas

Handles security and authorization

Data availability, crash recovery

Database tuning as needs evolve
Must understand how a DBMS works!
Database System Concepts
1.44
©Silberschatz, Korth and Sudarshan
Summary
•
•
•
•
•
•
•
DBMS used to maintain, query large datasets.
Benefits include recovery from system crashes, concurrent access,
quick application development, data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs and are well-paid!
DBMS R&D is one of the broadest,
most exciting areas in CS.
Advanced databases course at the graduate level
Database System Concepts
1.45
©Silberschatz, Korth and Sudarshan