Download Database Management System

Document related concepts

IMDb wikipedia , lookup

Oracle Database wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Concurrency control wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Database Management
System
Introduction
Warning
• This class is a lot of work.
• But it is worth it.
• Of all courses you take at CS, this
may be the one that gets you a job.
Syllabus
• The background and history of database
management systems.
• The fundamentals of using a database
management systems.
• Relational model.
• Queries and Updates.
• Relational Algebra.
• Normalization
• Transactions and Security.
• Object-oriented, object-relational, semi-structured
and XML database systems.
What Is a Database System?
• Database: a very large, integrated collection of data.
• Models a real-world enterprise
• Entities (e.g., teams, games)
• Relationships (e.g., Abo Teraka is playing in Al Ahly)
• More recently, also includes active components ,
often called “business logic”. (e.g., the BCS ranking
system)
• A Database Management System (DBMS) is a software
system designed to store, manage, and facilitate access
to databases.
Why Study Databases??
•
Shift from computation to information
•
•
•
•
always true for corporate computing
Web made this point for personal computing
more and more true for scientific computing
Need for DBMS has exploded in the last years
•
•
•
?
Corporate: retail swipe/clickstreams, “customer
relationship”, “supply chain”, “data warehouses”, etc.
Scientific: digital libraries, Human Genome project, NASA
Mission to Planet Earth, physical sensors, grid physics
network
DBMS encompasses much of CS in a practical discipline
•
•
OS, languages, theory, AI, multimedia, logic
Yet traditional focus on real-world apps
databases you may use
Database Applications
• These examples are what we called traditional database
applications
(First part of book focuses on traditional applications)
• More Recent Applications:
•
•
•
•
•
Youtube
iTunes
Geographic Information Systems (GIS)
Data Warehouses
Many other applications
Database Systems: Then
History of Database Systems
• 1950’s and early 1960’s:
•
Data processing using magnetic tapes for storage
•
•
Tapes provide only sequential access
Punched cards for input
• Late 1960’s and 1970’s:
•
•
•
Hard disks allow direct access to data
Network and hierarchical data models in widespread use
Ted Codd defines the relational data model
•
•
•
•
Would win the ACM Turing Award for this work
IBM Research begins System R prototype
UC Berkeley begins Ingres prototype
High-performance (for the era) transaction processing
• 1980s:
•
Research relational prototypes evolve into commercial systems
•
•
•
History (cont.)
SQL becomes industry standard
Parallel and distributed database systems
Object-oriented database systems
• 1990s:
•
•
•
Large decision support and data-mining applications
Large multi-terabyte data warehouses
Emergence of Web commerce
• 2000s:
•
•
•
•
XML and XQuery standards
Automated database administration
Increasing use of highly parallel database systems
Web-scale distributed data storage systems
=
Is a File System a DBMS?
• Thought Experiment 1:
•
•
•
You and your project partner are editing the same file.
You both save it at the same time.
Whose changes survive?
A) Yours B) Partner’s C) Both D) Neither E) ???
•Thought Experiment 2:
–You’re updating a file.
–The power goes out.
–Which of your changes
survive?
Q: How do you write
programs over a
subsystem when it
promises you only “???” ?
A: Very, very carefully!!
A) All B) None C) All Since Last Save D) ???
Can we do it without a DBMS ?
Sure we can! Start by storing the data in files:
students.txt
courses.txt
professors.txt
Now write C or Java programs to implement specific tasks
Doing it without a DBMS...
Write a C program to do the following:
•
Read ‘students.txt’
Read ‘courses.txt’
Enroll
“Mary Johnson”
“CSE444”:
Find&update
thein record
“Ahmed Hassan”
Find&update the record “CS444”
Write “students.txt”
Write “courses.txt”
13
Enters a DMBS
“Two tier database system”
Data files
Database server
(someone else’s
C program)
Applications
Problems without a DBMS...
• System crashes:
Read ‘students.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Write “students.txt”
Write “courses.txt”
CRASH !
• What is the problem ?
• Large data sets (say 50GB)
• What is the problem ?
• Simultaneous access by many users
• Need locks: we know them from OS, but now data on disk;
and is there any fun to re-implement them ?
Why Use a DBMS?
Access by a collection
of ad hoc programs
in C++, Java, PHP, etc.
• Without a DBMS, we'd have:
data stored as bits on disks
organized as files
users of
the data
There is no control or
coordination of what
these programs do
with the data
Why Use a DBMS?
applications
DBMS
users of
the data
• With a DBMS, we have:
data stored as bits on disks
organized as files
DBMS provides control
and coordination to
protect the data.
Database definition
• Database is “data” or facts supplied by a
base or software
• Files contain data with the same structure
• Database is an integration of different
kinds of data
Database Systems
• The big commercial database vendors:
• Oracle
• IBM (with DB2) bought Informix recently
• Microsoft (SQL Server)
• Sybase
• Some free database systems (Unix) :
• Postgres
• Mysql
• Predator
DBMS Functions
1.
2.
3.
4.
5.
6.
Define the database
Construct the database
Manipulating database
Data security and integrity
Concurrency
Recovery
Disadvantages of database
• Expensive
• Incompatible with any other DBMS
Concurrency
• A DBMS supports access by concurrent users
• concurrent = happening at the same time
• concurrent access, particularly writes (data changes),
can result in inconsistent states
(even when the individual operations are correct)
• the DBMS can check the actual operations of
concurrent users, to prevent activity that will lead to
inconsistent states
Access Control
• A DBMS can restrict access to authorized
users
• security policies often require control that is
more fine-grained than that provided by a file
system
• since the DBMS understands the data
structure, it can enforce fairly sophisticated
and detailed security policies
• on subsets of the data
• on subsets of the available operations
Redundancy Control
• A DBMS can assist in controlling redundancy
• redundancy = multiple copies of the same data
• with file storage, it's often convenient to store
multiple copies of the same data, so that it's "local" to
other data and applications
• this can cause many problems:
•
•
•
wasted disk space
inconsistencies
need to enter the data multiple times
Backup and Recovery
• A DBMS can provide backup and recovery
• backup = snapshots of the data particular times
• recovery = restoring the data to a consistent state
after a system crash
• the higher level semantics (relationships and
constraints)
can make it difficult to restore a consistent state
• transaction analysis can allow a DBMS to reconstruct
a consistent state from a number of backups
Views and Interfaces
• A DBMS can support
multiple user interfaces and user views
•
•
•
since the DBMS provides a well-defined data model and a
persistent data dictionary, many different interfaces can be
developed to access the same data
data independence ensures that these UIs will not be made
invalid by most changes to the data
new user views can be supported as new schemas defined
against the conceptual schema
Database Components
DBMS
===============
Design tools
Database
Database contains:
User’s Data
Metadata
Indexes
Application Metadata
Table Creation
Form Creation
Query Creation
Report Creation
Procedural
language
compiler (4GL)
=============
Run time
Form processor
Query processor
Report Writer
Language Run time
Application
Programs
User
Interface
Applications
Actors on DBMS
• Database Administrator
• System analysis
• Database designer
• Application programmer
• End user
Actors on the Scene
• Database Administrators
• acquiring a DBMS
• managing the system
• acquiring HW and SW to support the DBMS
• authorizing access (security policies)
• managing staff, including DB designers
Actors on the Scene
• Database Designers
• identifying the information of interested
in the Universe of Discourse (UoD)
• designing the database conceptual schema
• designing views for particular users
• designing the physical data layout and logical
schema
• adjusting data parameters for performance
Actors on the Scene
• Systems Analysts and Application Programmers
(generic database developers)
• provide specialized knowledge to optimize database
•
usage
provide generic (canned) application programs
Actors on the Scene
• End Users
• casual users: ad-hoc queries
• naïve or parametric users: canned queries
such as menus for a phone company customer
service agent
• sophisticated users: people who understand
the system and the data and use it in many
novel ways
• standalone users: people who use personal
easy-to-use databases for personal data
Three-Schema Architecture
user-specific
views
External View
External View
External View
Conceptual Schema
generic view
Internal Schema
physical view
Levels of Abstraction
Users
• Views describe how users
see the data.
• Conceptual schema defines
logical structure
• Physical schema describes
the files and indexes used.
• (sometimes called the
ANSI/SPARC model)
View 1
View 2
View 3
Conceptual Schema
Physical Schema
DB
Example: University Database
View 1
• Conceptual schema:
•
Conceptual Schema
Physical Schema
Courses(cid: string, cname:string, credits:integer)
Enrolled(sid:string, cid:string,
grade:string)
• External Schema (View):
•
Course_info(cid:string,enrollment:integer)
• Physical schema:
•
•
View 3
Students(sid: string, name: string, login: string,
age: integer, gpa:real)
•
•
View 2
Relations stored as unordered files.
Index on first column of Students.
DB
Levels of Abstraction
• Physical level: describes how a record (e.g., customer) is
stored.
• Logical level: describes data stored in database, and the
relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : string;
end;
• View level: application programs hide details of data
types. Views can also hide information (such as an
employee’s salary) for security purposes.
Conceptual Data Models
• A data model describes the possible schemas
(essentially the meta-schema)
• A DBMS is designed around a particular data model
•
this is what allows all system components (and humans)
to understand the schema and data
• possible data models
•
relational,
object-oriented, object-relational,
entity-relationship,
semantic, network, hierarchical, etc.
Physical Data Models
• A physical data model describes the way
in which data is stored in the computer
• typically only of interest to database designers,
•
•
implementers and maintainers …not end users
must provide a well-defined structure that can be
mapped to the conceptual schema
allows optimization strategies to be defined
generically
Instances and Schemas
• Similar to types and variables in programming languages
• Schema – the logical structure of the database
• Example: The database consists of information about a
set of customers and accounts and the relationship
between them)
• Physical schema: database design at the physical level
• Logical schema: database design at the logical level
• Instance – the actual content of the database at a particular
point in time
Classification
• DBMS has 3 criteria as
• Data models (relational & object &….)
• Number of users (single user & Multi-user)
• Number of sites (Centralized & Distributed)
Data model
Is a technique for organization data and
concepts to describe the structure of data,
relationship and integrity constrains.
Database models
1.
Relational data model
Oracle, Access
2.
Hierarchical data mode (as a tree)
IMS DBMS
3.
Network data model (as a graph)
IDMS DBMS
4.
Object oriented model
VERSANT DBMS
5.
Object relational data model
UNISQL DBMS
Data Models
• Hierarchical Model (1960’s and 1970’s)
•
Similar to data structures in programming languages.
Books
(id, title)
Authors
(first, last)
Publisher
Subjects
Data Models
• Network Model (1970’s)
•
Provides for single entries of data and navigational “links”
through chains of data.
Authors
Subjects
Books
Publishers
Data Models
• Object Oriented Data Model (1990’s)
•
Encapsulates data and operations as “Objects”
Books
(id, title)
Authors
(first, last)
Publisher
Subjects
• Example of tabular data in the relational model
Attributes
Relational Model
A Sample Relational Database
Relational data model
• Based on the relations between data
• Each relation or table (entity) is a data structure or a
collection of attributes describing data
• Attribute or a field is a column in the table
• A tuple or record is a raw in the table
Relational data model
• Null value is assigned to attribute which
means that the attribute is not yet known
• Primary key is a unique identifier for the
table. One attribute or combination of
attributes
Relational data model
• Foreign key is an attribute
(combination of attributes) is one
relation whose values are required to
match those of the primary of some
relation
• Candidate key is any key (primary or
foreign keys)
New Trends in Databases
• Object-relational databases
• Main memory database systems
• XML XML XML !
•
•
•
•
Relational databases with XML support
Middleware between XML and relational databases
Native XML database systems
Lots of research here at UW on XML and databases
• Data integration
• Peer to peer, stream data management – still research
SQL
•
SQL: widely used non-procedural language
•
•
•
Example: Find the name of the customer with customer-id 192-83-7465
select customer.customer_name
from
customer
where customer.customer_id = ‘192-83-7465’
Example: Find the balances of all accounts held by the customer with
customer-id 192-83-7465
select account.balance
from
depositor, account
where depositor.customer_id = ‘192-83-7465’ and
depositor.account_number = account.account_number
Application programs generally access databases through one of
•
•
Language extensions to allow embedded SQL
Application program interface (e.g., ODBC/JDBC) which allow SQL queries
to be sent to a database
The Entity-Relationship Model
• Models an enterprise as a collection of entities and
relationships
•
Entity: a “thing” or “object” in the enterprise that is
distinguishable from other objects
•
•
Described by a set of attributes
Relationship: an association among several entities
• Represented diagrammatically by an entity-relationship
diagram:
Transaction Management
• A transaction is a collection of operations that
performs a single logical function in a database
application
• Transaction-management component ensures
that the database remains in a consistent
(correct) state despite system failures (e.g.,
power failures and operating system crashes)
and transaction failures.
• Concurrency-control manager controls the
interaction among the concurrent transactions,
to ensure the consistency of the database.
An UNIVERSITY example
• A UNIVERSITY database for maintaining information
concerning students, courses, and grades in a university
environment
• We have:
STUDENT file stores data on each student
COURSE file stores data on each course
SECTION file stores data on each section of each course
GRADE_REPORT file stores the grades that students
receive
PREREQUISITE file stores the prerequisites
Example of a simple database
COMPANY Database
• The company is organized into DEPARTMENTs. Each
department has a name, number, and an employee
who manages the department. We keep track of the
start date of the department manager. A department
may have several locations.
• Each department controls a number of PROJECTs.
Each project has a name, number, and is located at a
single location.
COMPANY Database
• We store each EMPLOYEE's social security number,
•
address, salary, sex, and birth date. Each employee
works for one department but may work on several
projects. We keep track of the number of hours per
week that an employee currently works on each
project. We also keep track of the direct supervisor of
each employee.
Each employee may have a number of
DEPENDENTs. For each dependent, we keep their
name, sex, birth date, and relationship to the
employee.