Download DBMS LECTURE 01

Document related concepts

IMDb wikipedia , lookup

Concurrency control wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
DBMS INTRODUCTION AND
CONCEPTS
MINAKSHI GUJRAL
DBMS contains information about a particular enterprise
• Collection of interrelated data
• Set of programs to access the data
• An environment that is both convenient and efficient to use
Database Applications:
1. Banking: all transactions
2. Airlines: reservations, schedules
3. Universities: registration, grades
4. Sales: customers, products, purchases
5. Online retailers: order tracking, customized recommendations
6. Manufacturing: production, inventory, orders, supply chain
7. Human resources: employee records, salaries, tax deductions
Purpose of Database Systems
DBMS
• Database: A collection of related data.
• Data: Known facts that can be recorded and have
an implicit meaning.
• Mini-world: Some part of the real world about
which data is stored in a database. For example,
student grades and transcripts at a university.
• Database Management System (DBMS): A
software package/ system to facilitate the
creation and maintenance of a computerized
database.
• Database System: The DBMS software together
with the data itself. Sometimes, the applications
are also included.
What is DBMS ?
•
•
•
•
Collection of interrelated Information
Set of programs to access the data
DBMS contains information about a particular enterprise
DBMS provides an environment that is both convenient
and efficient to use and data is kept safely
• Information management is the focus of all applications
• DBMS LANGUAGES: MS ACCESS, ORACLE, SQL
SERVER, FOXPRO, SYBASE, XQUERY Etc.
Drawbacks of File systems and why
DBMS over file system
–
–
–
–
Data redundancy and inconsistency
Difficulty in accessing data
Data Isolation
Integrity problems
– Atomicity of updates
– Concurrent access by multiple users
– Security problems
In the early days, database applications were built directly on top
of file systems n Drawbacks of using file systems to store data:
1>Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
2>Difficulty in accessing data
Need to write a new program to carry out each new task
3>Data isolation — multiple files and formats
4>Integrity problems
• Integrity constraints (e.g. account balance > 0) become
• “buried” in program code rather than being stated explicitly
• Hard to add new constraints or change existing ones
• Functional components of the system:
A record in the system consists of the student’s last name, first name
and enrollment number (to make the data base more realistic, you
can optionally, add more data like address, date of birth, etc.)
The system should also store information about the courses for which
the student has registered Also there should be the name of faculty
teaching the course.
• Certain operations can be performed on this data base of student
records which allow to add student records to the system,
• display all the records in the system,sort the records in various
ways,
• search for a particular student record delete a student record from
the system.
• retrieve the list of subjects for which the student has registered.
• retrieve the names of the faculty taking courses for which a student
has registered
Problem statement
Write a menu-driven program in C that uses a file to store the student records
of the university in the form of structures and also to perform the above
operations
Observations
• Does the system that you have developed for the above problem statement,
have any redundancy of data?
• Is there any scope for inconsistency in the data files?
• What are the changes that you will have to incorporate each time a new
functionality is to be added to the system?
• Will it be possible for you to allow different users of the system (students,
faculty, registrar office etc) to perform different operations and have different
privileges of accessing data?
Main Characteristics of the Database
Approach
• Self-describing nature of a database system:
A DBMS catalog stores the description of the
database. The description is called meta-data).
This allows the DBMS software to work with
different databases.
• Insulation between programs and data:
Called program-data independence. Allows
changing data storage structures and operations
without having to change the DBMS access
programs.
• Data Abstraction: A data model is used to hide
storage details and present the users with a
conceptual view of the database.
• Support of multiple views of the data: Each user
may see a different view of the database, which
describes only the data of interest to that user.
• Sharing of data and multi-user transaction
processing : allowing a set of concurrent users to
retrieve and to update the database. Concurrency
control within the DBMS guarantees that each
transaction is correctly executed or completely
aborted. OLTP (Online Transaction Processing) is a
major part of database applications.
Advantages of Using the Database Approach
• Controlling redundancy in data storage and in
development and maintenance efforts.
• Sharing of data among multiple users.
• Restricting unauthorized access to data.
• Providing persistent storage for program
Objects
• Providing Storage Structures for efficient Query
Processing
Advantages of Using the Database Approach
• Providing backup and recovery services.
• Providing multiple interfaces to different
classes of users.
• Representing complex relationships among
data.
• Enforcing integrity constraints on the
database.
• Drawing Inferences and Actions using rules
• Potential for enforcing standards: this is
very crucial for the success of database
applications in large organizations
Standards refer to data item names,
display formats, screens, report
structures, meta-data (description of data)
etc.
• Reduced application development
time: incremental time to add each new
application is reduced.
• Flexibility to change data structures: database
structure may evolve as new requirements are
defined.
• Availability of up-to-date information – very
important for on-line transaction systems such as
airline, hotel, car reservations.
• Economies of scale: by consolidating data and
applications across departments wasteful overlap
of resources and personnel can be avoided.
Historical Development of Database
Technology
• Early Database Applications: The Hierarchical and
Network Models were introduced in mid 1960’s and
dominated during the seventies. A bulk of the worldwide
database processing still occurs using these models.
• Relational Model based Systems: The model that was
originally introduced in 1970 was heavily researched and
experimented with in IBM and the universities. Relational
DBMS Products emerged in the 1980’s.
• Object-oriented applications: OODBMSs were
introduced in late 1980’s and early 1990’s to cater to the
need of complex data processing in CAD and other
applications.
• Data on the Web and E-commerce Applications: Web
contains data in HTML with links among pages. This has
given rise to a new set of applications and E-commerce
is using new standards like XML
Types of Databases and
Database Applications
•
•
•
•
•
Numeric and Textual Databases
Multimedia Databases
Geographic Information Systems (GIS)
Data Warehouses
Real-time and Active Databases
Typical DBMS Functionality
• Define a database : in terms of data types,
structures and constraints
• Construct or Load the Database on a
secondary storage medium
• Manipulating the database : querying,
generating reports, insertions, deletions and
modifications to its content
• Concurrent Processing and Sharing by a set
of users and programs – yet, keeping all data
valid and consistent
Typical DBMS Functionality
Other features:
– Protection or Security measures to prevent
unauthorized access
– “Active” processing to take internal actions on
data
– Presentation and Visualization of data
When not to use a DBMS
• When no DBMS may suffice:
– If the database system is not able to handle the
complexity of data because of modeling limitations
– If the database users need special operations not
supported by the DBMS.
When not to use a DBMS
• Main inhibitors (costs) of using a DBMS:
– High initial investment and possible need for
additional hardware.
– Overhead for providing generality, security,
concurrency control, recovery, and integrity
functions.
• When a DBMS may be unnecessary:
– If the database and applications are simple, well
defined, and not expected to change.
– If there are stringent real-time requirements that may
not be met because of DBMS overhead.
– If access to data by multiple users is not required.
BRAIN STORMING – interview questions
1.
2.
What is schema, instance?
Give Internal,Conceptual and External Schema for
Event management system?
3. Physical level and logical level of data?
4. What is tuple?
5. Explain DML,DDL.
6. What is Relational Model, What is Key?
7. What is Intension and Extention of a relation ?
8. What are integrity Constraints,Is it different from key
constraints?
9. What is view how it is different from table?
10. What is N Tier Client Server Architecture?
11. Explain Logical and Physical data independence wrt
to reservation system?
12. Explain ODBC, Centralized DBMS.
INDEX
Database System Concepts and Architecture
•
•
•
•
•
•
Data Models
Instances and Schema.
View of Data and Levels of Abstraction.
Data Independence.
DBMS LANGUAGE/CLASSIFICATION OF DBMS.
CLIENT-SERVER ARCHITECTURE.
Data Models
• Data Model: A set of concepts to
describe the structure of a database,
and certain constraints that the
database should obey.
• Relational model
• Entity-Relationship data model
• Object-based data models
• Semi structured data model (XML)
• Other older models:
– Network model
– Hierarchical model
More on those Concepts read in
previous slides
Instances and Schemas
• Similar to types and variables in programming
languages
• Schema – the logical structure of the database
– Example: The database consists of information about a set of
customers and accounts and the relationship between them)
– Analogous to type information of a variable in a program
• Instance – the actual content of the database at a
particular point in time
– Analogous to the value of a variable
(fp1) Distinction
The database schema changes very infrequently. The
database state changes every time the database is updated.
(fp2) Proposed to support DBMS
characteristics of :
Three-Schema
Program-data independence.
Architecture
Support of multiple views of the
data.
• Defines DBMS schemas at three levels:
• Internal schema at the internal level to describe physical storage
structures and access paths. Typically uses a physical data
model.
• Conceptual schema at the conceptual level to describe the
structure and constraints for the whole database for a
community of users. Uses a conceptual or an implementation
data model.
• External schemas at the external level to describe the various
user views. Usually uses the same data model as the
conceptual level.
(fp3) Mappings among schema levels are needed to
transform requests and data. Programs refer to an
external schema, and are mapped by the DBMS to the
internal schema for execution
View of Data
An architecture for a database system
Levels of
Abstraction
• Physical level: describes how a record (e.g.,
customer) is stored.
• Logical level: describes data stored in database, and
the relationships among the data.
type customer = record
customer_id : string;
customer_name : string;
customer_street : string;
customer_city : integer;
end;
(fp4) View level: application programs hide details of data types.
Views can also hide information (such as an employee’s salary) for
security purposes.
Data
Independence
When a schema at a lower level is changed, only the mappings
between this schema and higher-level schemas need to be
changed in a DBMS that fully supports data independence.
The higher-level schemas themselves are unchanged. Hence, the
application programs need not be changed since they refer to the
external schemas.
(fp5) Logical Data Independence: The
capacity to change the conceptual schema
without having to change the external
schemas and their application programs.
Physical Data Independence: The capacity
to change the internal schema without
having to change the conceptual schema.
DBMS Languages
• Data Definition Language (DDL): Used by the DBA and
database designers to specify the conceptual schema of a
database.
• In many DBMSs, the DDL is also used to define internal
and external schemas (views).
• In some DBMSs, separate storage definition language
(SDL) and view definition language (VDL) are used to
define internal and external schemas.
(fp6) -DDL compiler generates a set of tables stored in a data
dictionary Data dictionary contains metadata (i.e., data about
data)
-Database schema
Data storage and definition language
Specifies the storage structure and access methods
used
-Integrity constraints
-Domain constraints
-Referential integrity (references constraint in SQL)
-Assertions
-Authorization
DBMS
Languages
• Data Manipulation Language (DML): Used to specify
database retrievals and updates.
• DML commands (data sublanguage) can be embedded in a
general-purpose programming language (host language), such
as COBOL, C or an Assembly Language.
• Alternatively, stand-alone DML commands can be applied
directly (query language).
• Two classes of languages
– Procedural – user specifies what data is required and how
to get those data
– Declarative (nonprocedural) – user specifies what data is
required without specifying how to get those data e.g. SQL
Classification of DBMS
• Based on the data model used:
• Traditional: Relational, Network, Hierarchical.
• Emerging: Object-oriented, Object-relational.
• Other classifications:
• Single-user (typically used with microcomputers) vs. multi-user (most DBMSs).
• Centralized (uses a single computer with one
database) vs. distributed (uses multiple
computers, multiple databases)
Client-Server
Architectures
Centralized DBMS:
combines everything into single system
including- DBMS software, hardware,
application programs and user interface
processing software.
Basic Client-Server Architectures
•Specialized Servers with Specialized functions
•Clients
•DBMS Server
Specialized Servers with Specialized
functions
•
•
•
•
File Servers
Printer Servers
Web Servers
E-mail Servers
Clients
•
Provide appropriate interfaces
and a client-version of the
system to access and utilize the
server resources.
- Clients maybe diskless
machines or PCs or
Workstations with disks with
only the client software
installed.
- Connected to the servers via
some form of a network.
(LAN: local area network,
wireless network, etc.)
DBMS Server
•
•
Provides database query and
transaction services to the
clients.
Sometimes called query and
transaction servers.
Two Tier Client-Server Architecture
• User Interface Programs and Application
Programs run on the client side
• Interface called ODBC (Open Database
Connectivity ) provides an Application program
interface (API) allow client side programs to call the
DBMS. Most DBMS vendors provide ODBC
drivers.
- A client program may connect to several DBMSs.
- Other variations of clients are possible: e.g., in some
DBMSs, more functionality is transferred to clients
including data dictionary functions, optimization and
recovery across multiple servers, etc. In such situations
the server may be called the Data Server.
Three Tier Client-Server Architecture
• Common for Web applications
• Intermediate Layer called Application Server or Web
Server:
• stores the web connectivity software and the rules
and business logic (constraints) part of the
application used to access the right amount of data
from the database server
• acts like a conduit for sending partially processed data
between the database server and the client.
• Additional Features- Security:
• encrypt the data at the server before transmission
• decrypt data at the client
The Relational Data Model and
Relational Database Constraints
Outline
• Relational Model Concepts
• Relational Model Constraints and Relational
Database Schemas
• Update Operations and Dealing with Constraint
Violations
Relational Data Model
Relational Model Concepts
• The relational Model of Data is based on the concept of a
Relation.
• A Relation is a mathematical concept based on the ideas of
sets.
• The strength of the relational approach to data
management comes from the formal foundation provided
by the theory of relations.
• The model was first proposed by Dr. E.F. Codd of IBM in
1970 in the following paper:
"A Relational Model for Large Shared Data Banks,"
Communications of the ACM, June 1970.
The above paper caused a major revolution in the field of
Database management and earned Ted Codd the coveted
ACM Turing Award.
INFORMAL DEFINITIONS
RELATION: A table of values
– A relation may be thought of as a set of rows.
– A relation may alternately be though of as a set of
columns.
– Each row represents a fact that corresponds to a
real-world entity or relationship.
– Each row has a value of an item or set of items that
uniquely identifies that row in the table.
– Sometimes row-ids or sequential numbers are
assigned to identify the rows in the table.
– Each column typically is called by its column name
or column header or attribute name.
FORMAL DEFINITIONS
• A tuple is an ordered set of values
• Each value is derived from an appropriate domain.
• Each row in the CUSTOMER table may be referred to
as a tuple in the table and would consist of four values.
• <632895, "John Smith", "101 Main St. Atlanta, GA
30332", "(404) 894-2000">
is a tuple belonging to the CUSTOMER relation.
• A relation may be regarded as a set of tuples (rows).
• Columns in a table are also called attributes of the
relation.
FORMAL DEFINITIONS
•
A domain has a logical definition: e.g.,
“phone_numbers” are the set of 10 digit
phone numbers
A domain may have a data-type or a format
defined for it.
1.
The USA_phone_numbers may
have a format: (ddd)-ddd-dddd
where each d is a decimal digit.
2.
E.g., Dates have various formats
such as monthname, date, year or
yyyy-mm-dd, or dd mm,yyyy etc.
An attribute designates the role played by
the domain. E.g., the domain Date may
be used to define attributes “Invoicedate” and “Payment-date”.
FORMAL DEFINITIONS
Let S1 = {0,1}
Let S2 = {a,b,c}
Let R  S1 X S2
Then for example: r(R) = {<0,a> , <0,b> , <1,c> }
is one possible “state” or “population” or
“extension” r
of the relation R, defined over domains S1 and S2.
It has three tuples.
R: schema of the relation
r of R: a specific "value" or population of R.
R is also called the intension of a relation
r is also called the extension of a relation
DEFINITION SUMMARY
Informal Terms
Formal Terms
Table
Relation
Column
Attribute/Domain
Row
Tuple
Values in a column
Domain
Table Definition
Schema of a Relation
Populated Table
Extension
Example
CHARACTERISTICS OF RELATIONS
• Ordering of tuples in a relation r(R): The tuples are
not considered to be ordered, even though they
appear to be in the tabular form.
•
Values in a tuple: All values are considered atomic
(indivisible). A special null value is used to represent
values that are unknown or inapplicable to certain
tuples.
Relational Integrity Constraints
•
Constraints are conditions that must hold on
all valid relation instances. There are three
main types of constraints:
1. Key constraints
2. Entity integrity constraints
3. Referential integrity constraints
Key Constraints
•
Superkey of R: A set of attributes SK of
R such that no two tuples in any valid
relation instance r(R) will have the same
value for SK. That is, for any distinct
tuples t1 and t2 in r(R), t1[SK]  t2[SK].
•
Key of R: A "minimal" superkey; that is, a
superkey K such that removal of any
attribute from K results in a set of
attributes that is not a superkey.
Example: The CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
has two keys Key1 = {State, Reg#}, Key2 =
{SerialNo}, which are also superkeys. {SerialNo,
Make} is a superkey but not a key.
• If a relation has several candidate keys, one is
chosen arbitrarily to be the primary key. The
primary key attributes are underlined.
Key Constraints
5.4
Entity Integrity
• Relational Database Schema: A set S of relation
schemas that belong to the same database. S is the
name of the database.
S = {R1, R2, ..., Rn}
• Entity Integrity: The primary key attributes PK of
each relation schema R in S cannot have null values
in any tuple of r(R). This is because primary key
values are used to identify the individual tuples.
t[PK]  null for any tuple t in r(R)
• Note: Other attributes of R may be similarly
constrained to disallow null values, even though they
are not members of the primary key.
5.5
5.6
Referential Integrity
• A constraint involving two relations (the previous
constraints involve a single relation).
• Used to specify a relationship among tuples in two
relations: the referencing relation and the
referenced relation.
• Tuples in the referencing relation R1 have attributes
FK (called foreign key attributes) that reference the
primary key attributes PK of the referenced relation
R2. A tuple t1 in R1 is said to reference a tuple t2 in R2
if t1[FK] = t2[PK].
• A referential integrity constraint can be displayed in a
relational database schema as a directed arc from
R1.FK to R2.
Referential Integrity
Constraint
Statement of the constraint
The value in the foreign key column (or columns)
FK of the the referencing relation R1 can be either:
(1) a value of an existing primary key value of the
corresponding primary key PK in the referenced
relation R2,, or..
(2) a null.
In case (2), the FK in R1 should not be a part of its own
primary key.
5.7
Other Types of Constraints
Semantic Integrity Constraints:
- based on application semantics and
cannot be expressed by the model.
- E.g., “the max. no. of hours per
employee for all projects he or she
works on is 56 hrs per week”
Update Operations on Relations
• INSERT a tuple.
• DELETE a tuple.
• MODIFY a tuple.
• Integrity constraints should not be violated by
the update operations.
• Several update operations may have to be
grouped together.
• Updates may propagate to cause other
updates automatically. This may be necessary
to maintain integrity constraints.
Update Operations on Relations
In case of integrity violation, several actions can be
taken:
Cancel the operation that causes the violation
(REJECT option)
Perform the operation but inform the user of the
violation
Trigger additional updates so the violation is
corrected (CASCADE option, SET NULL option)
Execute a user-specified error-correction routine
In-Class Exercise
Consider the following relations for a database that keeps track of
student enrollment in courses and the books adopted for each
course:
STUDENT(SSN, Name, Major, Bdate)
COURSE(Course#, Cname, Dept)
ENROLL(SSN, Course#, Quarter, Grade)
BOOK_ADOPTION(Course#, Quarter, Book_ISBN)
TEXT(Book_ISBN, Book_Title, Publisher, Author)
Draw a relational schema diagram specifying the foreign keys
for this schema.