Download Databases eng File - I3CT - ICT e

Document related concepts

Microsoft SQL Server wikipedia , lookup

Serializability wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Concurrency control wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Versant Object Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Relational algebra wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
Databases

Contents by topics:
 1 Introduction








2
3
4
5
6
7
8
9
Data modelling
Relational database languages
SQL language for relational databases
Physical database structure
Implementation of relational operations
Data integrity and safety
Examples of relational calculus and relational algebra
References
1 INTRODUCTION
Database definition
Database is a set of related data
(Ramirez Elmasri, Shamkent B. Navathe,
Fundamentals of Database Systems, Addison-Wesly)
2
5.5.2017.
Željko Knok
1.1 Data storage mediums

Non-electronic mediums
• Punched paper cards
• Punched paper tapes

Electronic mediums
• HD (disks)
• Tapes
• CD
• DVD
• USB
3
5.5.2017.
Željko Knok
1.2 Data organisation
All data are stored in files, which may be:

Classical files (sequential, relative, index…)

Files organised in databases
•
•
•
database presents one aspect (abstraction) of the real
part of the world. Changes in the part of the world reflect
in a database
database is designed, built and filled with specific-purpose
data
database is a set of mutually connected data stored in
external computer memory
4
5.5.2017.
Željko Knok
1.3 Database architecture

Physical level – display and arrangement of data on the
external memory units

Global logical level – logical structure of the entire base,
it is an aspect seen by a database designer and
administrator. Record of the logical definition is called a
shema.

Local logical level – refers to the logical image of the part
of the database used by a specific application. The
record of one local logical definition is called a view.
5
5.5.2017.
Željko Knok
1.4 Database Management
System

DBMS – Database Management System is a set of
programs which enables a user to create and maintain
a database (database server)

DBMS is a general-purpose software system which:
•
Creates a physical presentation of a database in
accordance with the required logical structure
•
Performs all data operations
•
Takes care of data safety
•
Automates database administration tasks
6
5.5.2017.
Željko Knok
1.5 Data model
Represents the set of rules which defines the
database logical structure and is the basis for:
•
drafting
•
designing
•
database implementation
7
5.5.2017.
Željko Knok
1.6 Database models
• Relational model
Data and links between data are shown by “rectangular” tables
• Object model
Inspired by object-oriented programming languages, the base is a set
of permanently stored objects, which consist of internal data and
“methods” (operations) for manipulating the data.
Each object belongs to a class. Between the classes are established
inheritance relationships, i.e. the operations are mutually used
8
5.5.2017.
Željko Knok
1.7 Objectives
• Physical data independence
• Logical data independence
• Flexible data approach
• Simultaneous data access
• Integrity protection
• Possibility of recovery after failure
• Protection of unauthorised use
• Satisfactory access speed
• Possibility of adjustment and control
9
5.5.2017.
Željko Knok
1.8 Database example
• The database containing the University data on:
students
• courses
• teachers
• exams
• study program…
The data in a database are organised as data sets with the same
properties - entity sets
10
5.5.2017.
Željko Knok
1.8 Database example
• For each data set basic elements should be defined
E.g.
Student (Student course book, Name, Surname, Date of birth,
study programme...) - attributes.
For each basic data it is necessary to define the data type
E.g.
Name is a string(20), Student course book is an integer,
Date of birth is date…
11
5.5.2017.
Željko Knok
1.8 Database example
Entity attributes
Example of student data - structure
Database scheme
Student course
book
Name
Surname
Date of birth
Study
programme
23401
Marko
Marić
11.12.1985
Računalstvo
23402
Ana
Anić
07.06.1990
Menadžment
23403
Iva
Ivić
12.11.1991
Menadžment
Primary key
Description of entity
- tuple
12
5.5.2017.
Željko Knok
1.9 Database compared to file
organisation
• Properties of file organisation
• Files contain only data. Data are described (data structure) in
the program
• Information on mutual relationships between the data are not
stored in files
• Specific data properties are not stored in files
• There is no possibility of storing undefined data
13
5.5.2017.
Željko Knok
1.9 Database compared to file
organisation
• Database properties
•Databases contain data and data description
• Databases keep data about relationships between data
• Databases store properties of individual values
• Undefined data can be used
• Access control is enabled
• Update control is enabled
14
5.5.2017.
Željko Knok
1.10 Database users
• DB administrator – administers the database
• DB designer – designs the database
• Database end users
• occasssional
• unexperienced
• sophisticated users
• Behind the scene users
• DBMS system designers and implementators
• Tool creators
• Operators and maintenance men
15
5.5.2017.
Željko Knok
2 DATA MODELLING
How to create a database schema, harmonised with the
relational model rules?
16
5.5.2017.
Željko Knok
2.1 Entity – relationship
modelling
•
represents the conceptual schema, as abstraction of
the real world
•
In the entity-relationship modelling the world is
observed through three categories;
•
•
•
Entity: objects or events of our interest
Relationship: relationship between the entities of
our interest
Attributes: entity and relationship properties of our
interest
17
5.5.2017.
Željko Knok
2.2 Entities and attributes
Entity is any object in the real world:
• Object
• Event
• Phenomenon
•
Attributes describe an entity
• e.g. attributes of a house are: address, number of floors,
facade colour…
• some attributes can have their own attributes, such an
attribute should be considered a new entity (e.g. car model)
Entity name, together with the associated attributes defines the
entity type
Candidate key is an attribute or set of attributes whose values
uniquely define an example of the entity type.
•
18
5.5.2017.
Željko Knok
2.3 Relationships
Relationships are established between two or more entity
types.
(e.g. relationship PLAY_FOR between the entity types
PLAYER and TEAM)
Relationships represent the binary or k-ary relation between
the examples of entity types.
•
RELATIONSHIP FUNCTIONALITY CAN BE:
1. One-to-one (1:1)
e.g. relationship IS_HEAD between the entity types
TEACHER and DEPARTMENT (college department)
2.
One-to-many (1:N)
e.g. relationship TEACH between the entity types
TEACHER and COURSE
3. Many-to-many (M:N)
e.g. relationship ENROLLED between the entity types
STUDENT and COURSE
19
5.5.2017.
Željko Knok
2.4 Complex relationships
In real situations appear the relationships which are more complex
than those previously mentioned:
•
Involuted relationships
One entity type is related to the same entity type
•
Sub-types
Entity type E1 is sub-type of entity type E2 if every E1
example is also example of E2
•
Ternary relationships
Relationship between three entity types is established
e.g. companies, products they manufacture and countries to
which they export their products
20
5.5.2017.
Željko Knok
2.5 E-R schema diagram
ER-schema is usually shown as a diagram in which the
rectangles present entity types and rhombi present
relationships
1
DEPARTMENT
1
1
offers
N
is_head
N
COURSE
1
teaches
N
enrolled
M
STUDENT
21
5.5.2017.
Željko Knok
is_in
1
TEACHER
N
2.6 The role of E-R schema
•
ER model is simple enough to be used by people of different
professions.
•
ER schema serves for communication between the database
designer and the user in the earliest stage of database
development
•
The existing DBMS cannot directly implement the ER schema,
therefore it should be further elaborated.
22
5.5.2017.
Željko Knok
2.7 Relational model
• RELATIONAL MODEL
Data and relationships between the data are shown by
“rectangular” tables.
In mid 80s of the 20th century relational model prevailed and
today most DBMSs use that model
A database consists of a set of rectangular tables, called
Relations.
Each relation has its name, by which it can be differentiated
from the others in the same base.
Relation columns represent attributes, and attribute values
are the same data type.
A line is called a tuple. A relation cannot contain two same
tuples.
The number of attributes is the degree of a relation, and the
number of tuples is relation cardinality.
23
5.5.2017.
Željko Knok
2.8 Example of relationship
CAR
CAR
REG_NUMBER MANUFACTURER
MODEL
YEAR
CD234
Ford
Fiesta
1997
XC294
Nissan
Primera
1998
AU930
Ford
Escort
2002
PD402
Fiat
Punto
2008
VE838
Volkswagen
Golf
2005
24
5.5.2017.
Željko Knok
2.8 Example of relationship
CAR
Key REG_NUMBER of the relationship CAR is the subset of attributes of CAR which has the following
properties:
1. Attribute values of REG_NUMBER uniquely define
the tuple in CAR.
So, it is not possible for CAR to contain two tuples
with same attribute values from REG_NUMBER.
2. If any attribute from REG_NUMBER is removed,
property 1 is disrupted.
25
5.5.2017.
Željko Knok
2.9 Primary key
One of the attributes is called a primary key.
The attributes that make the primary key are called primary
attributes. The primary key attribute cannot have a null value
in any tuple.
Relationship structure is briefly described with the so called
relational schema, which consists of the relation name and a
list of attribute names in parentheses. Primary attributes are
underlined. For example, CAR relational schema looks like
this:
CAR ( REG_NUMBER, MANUFACTURER, MODEL, YEAR)
26
5.5.2017.
Željko Knok
2.10 Translation of E-R schema
into relational model
1. Translation of entity types – each entity type is presented by
one relation. Type attributes become relational attributes.
Primary entity key becomes primary relational key.
Entity STUDENT is shown by the relation
STUDENT ( NO_STUDENT COURSE BOOK, NAME_STUDENT,
ADDRESS, GENDER, …)
2. Translation of binary connections
If entity type E2 has obligatory membership in (N:1) relation
with type E1, then E2 relation should include E1 attributes
COURSE (NO_COURSE, NAME_DEPARTMENT, NAME,
SEMESTER, …)
The key of one relation which is copied into another relation in
that relation is called a foreign key
3. Translation of involuted relationships
Entity type PERSON (1:1) of the connection MARRIED_TO is
best shown with two relations;
PERSON(PIN, SURNAME_NAME, ADDRESS, …)
MARRIAGE(PIN_HUSBAND, PIN_WIFE, DATE_MARRIAGE)
27
5.5.2017.
Željko Knok
2.10 Translation of E-R schema
into relational model
4. Translation of entity subtypes
Subtype is presented with a relationship which contains primary
attributes of superior type and the attributes specific for this
subtype. For example, hierarchy of types is shown by the
relations
PERSON(PIN, …attributes common to all types of persons…)
STUDENT(PIN, … attributes specific for students …)
TEACHER(PIN, … attribues specific for teachers …)
LECTURER(PIN, … attributes specific for lecturers …)
5. Translation of ternary links
Ternary link is shown with a relation which contains primary
attributes of all three entity types together with eventual
connection attributes.
COMPANY (NAME_COMPANY, …)
PRODUCT (NAME_PRODUCT, …)
COUNTRY (NAME_COUNTRY, …)
EXPORTS(NAME_COMPANY, NAME_PRODUCT,
NAME_COUNTRY)
28
5.5.2017.
Željko Knok
2.11 Relational model
normalisation
• Relational schema, obtained from the ER-schema based
on the previous requests, can contain imperfections which
must be removed before the implementation.
• The process by which the existing schema is modified is
called normalisation
• Normalisation is based on the concept of normal forms
• They are divided into normal forms: first normal form,
second normal form.. and are marked as 1NF, 2NF…
29
5.5.2017.
Željko Knok
2.12 First Normal Form
•
Conditions 1NF
1. Connection between data on logical, not physical
level
(address on the disk)
2. For each entity type there is one primary key
3. Each field within an entity has a common name
which does not repeat
The first two conditions are mandatory for relational
databases, whereas the third condition is not
satisfied for the following example ( because the
field županija repeats)
Entity: Country
(ID_COUNTRY,NAME;CAPITAL_CITY,COUNTY1,CO
UNTY2,..)
30
5.5.2017.
Željko Knok
2.13 Second Normal Form
•
Conditions 2NF
1. Conditions of 1NF must be satisfied
2. Each field within an entity which is not a part of the
primary key functionaly depends on the entire
(composite) key
Second normal form mostly applies to primary keys
composed of two or more fields (composite primary key).
Entity: STUDENT
STUDENT(CODE_FACULTY, NO_STUDENT COURSE BOOK,
NAME_STUDENT,…)
31
5.5.2017.
Željko Knok
2.14 Third Normal Form
•
Conditions 3NF
1. Conditions of 2NF must be satisfied
2. There is no transitive dependency of any field on any key
Transitivity: if a R b and b R c, a R c follows.
Entity: County
COUNTY(COUNTY, COUNTRY, CAPITAL_CITY_COUNTRY,…)
Field CAPITAL_CITY_COUNTRY depends on the field
COUNTRY, and the field on the COUNTRY field COUNTY.
Relation is not in 3rd normal form so it should be divided into
two relations:
COUNTY, COUNTRY, i.e. COUNTRY,
CAPITAL_CITY_COUNTRY.
32
5.5.2017.
Željko Knok
2.15 Normal forms consequences
• There are higher level normal forms
• Normal forms are a set of rules useful for modelling general
database cases
• Complex data types sometimes can require compromises:
deviation from an ideal solution
• Such complex data types very often occur in large and complex
databases.
• Result: object-relational databases
33
5.5.2017.
Željko Knok
3 LANGUAGES FOR RELATIONAL
DATABASES
34
5.5.2017.
Željko Knok
3.1 Relational algebra
- performing algebraic expressions, built from relations and
unary and binary operators
- each algebraic expression represents one query
(search/browse)
- a simplified version of the University database will serve as
an example:
STUDENT (st_student course book, st_name, st_year),
COURSE (co_id, co_name, co_tea)
REPORT (iz_id, iz_co_id, iz_result)
TEACHER (na_name, na_room)
35
5.5.2017.
Željko Knok
3.1 Relational algebra
STUDENT
REPORT
ST_COUR
SE BOOK
ST_NA
ME
ST_YEAR
876543
Jones
2
864532
Burns
1
856434
Cairns
3
876421
Hughes
2
COURSE
RE_ID
RE_CO_ID
RE_RESULT
876543
216
82
864532
216
75
864532
312
71
856434
121
49
876421
312
39
876543
251
70
CO_ID
CO_NAME
CO_TEA
864532
251
69
216
Baze podataka
Black
864532
121
78
312
Programiranje
Welsh
251
Numerička mat
Quinn
121
PAUP
Holt
36
5.5.2017.
TEACHER
TEA_N
AME
TEA_ROO
M
Black
1017
Welsh
1024
Holt
2014
Quinn
1010
Željko Knok
3.1.1 Set operators
Relations are sets of tuples. Therefore, set operators can be
applied to them.
Let R and S define relations. As a result:
R union S ... Set of tuples which are in R or in S ( or in both
relations)
R intersect S ... Set of tuples which are in R and also in S
R minus S ... Set of tuples which are in R but not in S
In order to be able to apply the operators, the relations R and S
must be compatible (they must have the same level and same
attributes – names and types)
Notice that the following is always valid:
R intersect S = R minus (R minus S) from the
above mentioned it can be concluded that...?
37
5.5.2017.
Željko Knok
3.1.1 Set operators
Example: observe the relation NEW_STUDENT
STUDENT union NEW_STUDENT
NEW_STUDENT
ST_COUR
SE BOOK
ST_NA
ME
ST_YEAR
876542
Smith
3
865698
Turner
2
875923
Murphy
2
856434
Cairns
3
871290
Noble
1
STUDENT minus NEW_STUDENT
ST_COUR
SE BOOK
ST_NA
ME
ST_YEAR
876543
Jones
2
864532
Burns
1
876421
Hughes
2
38
ST_COUR
SE BOOK
ST_NA
ME
ST_YEAR
876543
Jones
2
864532
Burns
1
856434
Cairns
3
876421
Hughes
2
876542
Smith
3
865698
Turner
2
875923
Murphy
2
871290
Noble
1
STUDENT intersect NEW_STUDEN
5.5.2017.
ST_COUR
SE BOOK
ST_NA
ME
ST_YEAR
856434
Cairns
3
Željko Knok
3.1.2 Selection
Selection is a unary operator which selects those tuples from the
relation which satisfy the given Boolean conditions.
Selection on relation R in line with the Boolean condition β is
marked with R where β.
Condition β is an equation consisting of :
• Operands which are either constants or attributes
• Comparison operators =,<, >, ≤, ≥, ≠,
• Logical operators and, or, not.
39
5.5.2017.
Željko Knok
3.1.3 Projection
Projection is a unary operator which selects given attributes
from the relation, with duplicate tuples eliminated from the
resulting relation.
Examples:
Find the room numbers of all lecturers
LEC_ROO
M
1017
1024
2014
1010
Find the name of the lecturer who teaches
course 312
40
5.5.2017.
Željko Knok
CO_LEC
Welsh
3.1.4 Cartesian product
If R and S are relations of levels n1 and n2, then, algebraic
expression R times S gives the Cartesian product of R
and S.
Example:
1. List all the courses not enrolled in for each student!
ALL_ENROLLED:=STUDENT(ST_COURSE BOOK) times
COURSE(CO_ID),
NOT_ENROLLED:= ALL_ENROLLED minus
REPORT(RE_ID,RE_CO_ID)
41
5.5.2017.
Željko Knok
3.1.4 Cartesian product
Example:
2. Find all pairs of students in the same year
TEMP aliases STUDENT,
PAR:= ( ( TEMP times STUDENT)
where ( (TEMP.ST_YEAR= STUDENT.ST_YEAR)
and (TEMP.ST_COURSE BOOK <
STUDENT.ST_COURSE BOOK) ) )
[TEMP.ST_NAME, STUDENT.ST_NAME]
42
5.5.2017.
Željko Knok
3.1.5 Natural join
Natural join is a binary operator applicable to two relations R and
S, which have at least one common attribute. R join S consists of
all tuples obtained by joining one tuple from R with one tuple from
S, which have the same common attribute values.
Examples:
1. Names of all students enrolled in course 251!
QUERY1 := ( REPORT where RE_CO_ID=251) join STUDENT)
[ST_NAME]
2. Find the room number of the lecturer who teaches course 312!
QUERY2 := (( COURSE where CO_ID=) 312 join TEACHER)
[ROOM NUMBER].
43
5.5.2017.
Željko Knok
3.1.6 Other set operators
1. Theta-joint – represents the combination of the Cartesian
product and selection
2. Division – marked as divideby
3. Outer joint – marked as outerjoin
is used for searching the data which do not satisfy a certain
condition
Practice:
Find the names of the students who did not enrol in any
course!
44
5.5.2017.
Željko Knok
3.2 Relational calculus
The query includes a predicate which has to be satisfied by
tuples.
There are two types :
1. Tuple-oriented calculus (where tuples are basic objects)
2. Domain-oriented calculus (where attribute domains are basic
objects)
45
5.5.2017.
Željko Knok
3.3 SQL Language
The query is requested by a flexible command SELECT. The result of
the query is considered a new temporary relationship, derived from
the permanent ones.
SQL language structure;
SELECT atributtes
FROM relation
WHERE condition;
For entry, change and deletion of data, the following commands are
used; INSERT, UPDATE and DELETE
46
5.5.2017.
Željko Knok
3.3 SQL Language
Examples:
QUERY1: Find the numbers and names of all students on level 1
QUERY2: Find the numbers and names of the students who enrolled in
course 121.
QUERY3: Find all pairs of numbers of students which
refer to same year of study
QUERY4: Find all data about the students who did not enrol
in course 121.
Remark:
To answer these tasks the relationships mentioned at the beginning of
this unit should be followed.
47
5.5.2017.
Željko Knok
4 SQL LANGUAGE FOR
RELATIONAL DATABASES
48
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.1 Type of data
When defining a relationship, an attribute type has to be
specified.
Attribute may be; date, number, name, text, internet
computer number, logical value such as truth/lie
and so on.
These are some of more important and more often used attribute
types:
INT
Integer – usually 4 bytes, although it depends on DBMS.
BIGINT
Integer stored in 8 bytes.
SMALLINT
Integer stored in 2 bytes.
REAL
Decimal number stored in 4 bytes
NUMERIC (p,s)
Arbitrary-precision decimal number
49
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.1 Type of data – cont.
BOOLEAN
Truth/Lie (TRUE/FALSE, t/n, y/n, 1/0, ...)
CHAR(n)
String – a sequence of letters/numbers/characters of fixed length
n.
VARCHAR(n)
String – a sequence of letters/numbers/characters of maximum
length n.
TEXT
String of arbitrary length (MEMO)
DATE
Date, e.g. 2002-10-21
TIME
Time, e.g. 04:05:06
TIMESTAMP
Date+Time (1999-01-08 04:05:06)
50
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.2 Definition of relationship
Relationship is defined using SQL command CREATE TABLE,
followed by the relationship name and the list of attribute
definitions in parentheses, separated by commas. Attributes are
defined by the attribute name, followed by the specification of
attribute types and other attribute properties.
Example of definining the relation student:
CREATE TABLE STUDENT(
surname VARCHAR(50),
name VARCHAR(50),
index VARCHAR(10),
year INT,
module VARCHAR(10),
PRIMARY KEY (index)
);
Primary list which defines a relation key is created at the end of
the relationship.
51
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.2 Definition of relationship
The second example defines the relationship which contains
course data:
CREATE TABLE KOLEGIJ (
name VARCHAR(50),
surname VARCHAR(50),
kid INT,
name VARCHAR(100),
hours1 INT,
exercises1 INT,
hours2 INT,
exercises2 INT,
PRIMARY KEY(kid)
);
Relationship is deleted from the database with the command
DROP TABLE <name_relation>.
52
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.3 Data input
New data are entered in the relationship with SQL command INSERT
INTO, followed by the relationship name and optionally the list of
attribute names in parentheses, then follows the word VALUES, and
the list of attribute values in parentheses.
Let’s have a look at the example of inserting students into the relation
student:
INSERT INTO STUDENT VALUES ('ANTOLIĆ','ANITA','F-1961',2,'pfi');
INSERT INTO STUDENT VALUES ('ANTOLKOVIĆ','VLATKA','F1761',2,'pfi');
INSERT INTO STUDENT VALUES ('BABIĆ','GORDAN','F2523',1,'pfi');....
The list of attribute names does not have to be specified if the
subsequent list of values follows the order of relationship definition.
Text values are limited by the quotation mark unlike numeric values.
If the value for a certain attribute is not stated, the attribute acquires the
predefined value, which is specified when the relation is defined, or
NULL value (empty).
53
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.4 Update/change of data
Data are updated with SQL command UPDATE.
Example:
Problem: Change the name of prof. K.Furić who teaches the course
No. 2362 into "Krešimir"!
UPDATE course SET name='Krešimir' WHERE kid=2362;.
After the word SET comes the list of attributes, which are updated
separated by commas.
4.1.5 Deleting data
Data are deleted with the command DELETE FROM.
Example:
Problem: Delete the elective course „History of Informatics" (2404) from
the PFI study programme!
DELETE FROM study_pfi WHERE kid=2404;
54
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.6 Transactions
A transaction begins with SQL command BEGIN, then follows the
sequence of SQL instructions by which the data are changed or
browsed. This sequence of instructions should finish with the SQL
command COMMIT or to delete the previous sequence of commands
with the command ROLLBACK.
BEGIN TRANSACTION;
UPDATE STUDENT SET student course book='F-3342' WHERE
student course book ='F-3343';
UPDATE LECTURE SET student course book ='F-3342' WHERE
student course book ='F-3343';
ROLLBACK TRANSACTION;
or
BEGIN TRANSACTION;
DELETE FROM STUDENT WHERE student course book ='F-3343';
INSERT INTO STUDENT VALUES ('Nikić','Nikša','F-3342',4,'pfi'),
UPDATE LECTURE SET student course book ='F-3342' WHERE
student course book ='F-3343';
COMMIT TRANSACTION;
55
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.7 Queries
The query is defined with the command SELECT. In order to get differen
tuples the command SELECT DISTINCT should be used.
Example of a simple query
Problem: Find student course book numbers and names of all first-year
students!
SELECT student course book, name, surname
FROM STUDENT
WHERE year=1
ORDER BY surname,name;
Student course book | name
| surnamee
---------+--------------+ ------------F-2523
| GORDAN
| BABIC
F-2506
| KRESIMIR
| BACIC
F-2271
| DAMIR
| BAKMAZ
F-2143
| TIBOR
| BALI
F-2144
| IVAN
| BEDNJANEC
F-2356
| BRUNO
| BLAZINIC
.....
F-2561
| DEJAN
| ZIKOVIC (69 rows)
56
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.7 Queries
Problem:
Find student course book numbers and names of the students who
signed up for the course No. 224!
SELECT student.name,student.surname,student.student course book
FROM STUDENT, LECTURE
WHERE student.student course book=lecture.student course
book AND lecture.kid=1224
ORDER BY student.surname;
name
| surname
| student course book
---------- +-------------+-------AMIR
| EL-OCH
| F-2025
IVAN
| GLADOVIC | F-1823
MARIO | KLOKOCKI | F-1851
MARIN | KOSOVIC
| F-1830
VEDRAN | KRALJ
| F-1972
...
ZRINKA | SUMANOVAC| F-1789
KRESIMIR| VURNEK
| F-2023(17 rows)
57
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.7 Query
The required could be obtained in a different way, by using subqueries
SELECT ime,prezime,indeks
FROM STUDENT
WHERE student course book IN (SELECT STUDENT
COURSE BOOK FROM lecture
WHERE kid=1224)
ORDER BY surname;
IN means belonging to a set. So, it is a nested SELECT command SELECT within another SELECT.
There is another alternative,
SELECT name,surname,student course book FROM STUDENT
NATURAL JOIN LECTURE
WHERE kid=1224 ORDER BY student.surname;
using natural connection (join) of two relations.
58
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.7 Query
Problem: Find student course book numbers and names of the student
who signed up for at least one course lectured by professor Androic!
Similar to the previous example there are more alternatives, one using
the Carthesian product of three relations and another with nested
SELECT commands:
SELECT student.name,student.surname,student,student course book
FROM STUDENT,LECTURE,COURSE
WHERE student.student course book=lecture.student course
book
AND lecture.kid=course.kid AND
course.surname='Androic';
OR
SELECT name,surname, student course number
FROM STUDENT
WHERE student course number IN ( SELECT student course
number FROM LECTURE
WHERE kid IN (SELECT kid FROM KOLEG
WHERE surname ='Androic')
59
5.5.2017.
Željko Knok
4.1 Introduction to SQL Language
4.1.7 Query
Problem: Find all pairs of course book numbers of the students in the
same year!
SELECT t1.indeks,t2.indeks
FROM STUDENT t1,STUDENT t2
WHERE t1.student course number < t2.student course
number AND t1.year=t2.year;
Here we have introduced aliases or so called other names, t1 and
t2, for the relation STUDENT.
The use of an alias is needed because it is the Carthesian product of
the relation STUDENT with itself, when there is a possibility of
confusion with attribute names.
In front of the alias the word AS can be added.
60
5.5.2017.
Željko Knok
5 PHYSICAL DATABASE
STRUCTURE
61
5.5.2017.
Željko Knok
5.1 Physical structure elements
Data are stored on magnetic disks.
As for the physical database structure one should know what
• record
• file
• pointers are,
which represents a very low level of abstraction very close to reality.
62
5.5.2017.
Željko Knok
5.1 Physical structure elements
University data file
63
5.5.2017.
Željko Knok
5.1.1 External computer memory
•OS divides the external memory into blocks of permanent size
e.g. 512 bytes or 4096 bytes
• Each block is unanimously defined by its address
• Basic external memory operation is transfer of a block with
a given address from external memory to main memory and
vice versa
• The part of main memory which participates in the transfer is buf
• Block is the smallest amount of data which can be transferred.
• Time needed for transfer is not a constant value, it depends on
the position of a disk head
• Time for manipulation with external memory – in ms
• Time for manipulation with main memory – in ns
64
5.5.2017.
Željko Knok
5.1.1 External computer memory
File in blocks with related records
65
5.5.2017.
Željko Knok
5.1.2 Files
• At this level it is a final sequence of records of the same type
stored in external memory
• Record type is defined as a tuple of basic data (described by
name and type)
• Records are of fixed length (one record has specific attribute
value, which is shown by a fixed number of bytes).
Typical operations performed on files are:
- insert a new record
- modify a record
- remove a record
- find a record or records for which the given data have
given values
66
5.5.2017.
Željko Knok
5.1.3 External memory files
• A record is usually smaller than a block
(more records in a block)
• Record address is structured as an ordered pair
(block address, shift within a block)
• At some file organisations, some spaces in a block
may be left empty
• How to distinguish full and empty spaces?
• Enlarge the record with one bit, which denotes
whether the space is „empty” or “full”
• It is sometimes necessary to “delete” a record,
but to show that its place is still taken one more bit in
the record is needed, which shows that the record is
“valid” or “not valid”
• The entire file usually takes more blocks
• Because of the input and deletion of records,
the external memory is fragmented
67
5.5.2017.
Željko Knok
5.1.4 Pointers
• Pointer is a value within a record which points to another record ( in
the same or another file)
• Pointer can be:
• Record address (“physical”)
• Primary key value (“logical”)
• Enables to establish communication among records
• Pointer-address enables fast access
• Pointer-key is “slow” – implicitly defines a record which should be
found
• Presence of a pointer-address may cause problems with file
reorganisation or update.
• If a pointer-address points to a record, the record is „pinned”.
• The presence of a pointer-key does not cause that kind of trouble,
but regardless the “slowness” these pointers are used
68
5.5.2017.
Željko Knok
5.1.5 Physical base structure
• The entire base is structured as a set of files
• Base records can be mutually connected with pointers
• All database operations are the operations on database files
• If it is a relational database, then every relationship represents
one file
• Relational attributes correspond to basic record data
• One relational tupple is shown by one record
• Primary key of a relational database defines the primary file key
• Physical relational database structure can also contain
additional ancillary files which make the search and connections
among the data faster. Example of such files are indices.
69
5.5.2017.
Željko Knok
5.2 File access based on the
primary key
• A very important operation with files is access based on the
primary key.
• The address of a record (not more than one) that contains the
given value of the primary key should be defined
• Later in the presentation the ways of file realisation based on the
primary key and corresponding data oganisation.
70
5.5.2017.
Željko Knok
5.2.1 Simple file
•
absence of any kind of structure
•
Data records are put in as many blocks as needed
•
Blocks that form a file may be connected:
1. in a linked list (every block contains the address of
the following block)
2. in the address table of all blocks (which occupies
the first or first few blocks)
•
Searching for the record with a given key value requires
reading the entire file
•
A new record is placed in the first free space in
the first unfilled block
•
Records can be modified without any limitations
71
5.5.2017.
Željko Knok
5.2.1 Simple file
72
5.5.2017.
Željko Knok
5.2.2 Hash file
•
File records are put into P boxes, marked with numbers 0, 1, 2, …,
P-1.
•
Each box consists of one or more blocks
•
The given hash function h gives the number of h(k) box, where the
record with key value k should be located.
•
A set of possible key values is usually much larger than the numbe
of boxes
•
h should distribute key values to the boxes in a uniform manner
73
5.5.2017.
Željko Knok
5.2.2 Hash file
•
Example of a good hash function:
•
Key values are seen as bit sequences of a fixed length;
•
Given bit sequence is divided into fixed-length sets; zeros are
added to the last set if needed;
•
Bit sets are added like integers;
•
The sum is divided by the number of boxes;
•
The reminder after division is the number of the required box.
74
5.5.2017.
Željko Knok
5.2.2 Hash file
75
5.5.2017.
Željko Knok
5.2.3 Index file
•
Index is a small ancillary file which facilitates searching in
a large (main) file.
Two variants of index files will be presented:
a. Index sequential file organisation
b. Index-direct file organisation
76
5.5.2017.
Željko Knok
5.2.3 Index file
a. Index sequential file organisation
•
Records in the main file should be sorted by key values
•
Blocks do not have to be completely filled
•
So called Dilution index is added
•
Each index record corresponds to one block of the main
file
•
Record form is (k,a);
k – the smallest key value in the particular block
a – block address
77
5.5.2017.
Željko Knok
5.2.3 Index file
a. Index sequential file organisation
78
5.5.2017.
Željko Knok
5.2.3 Index file
b. Index-direct file organisation
•
Allows for records in the main file to be sorted at
random
•
So called Dense index is added
•
Each index record corresponds to one block of the
main file
•
Record form is (k,a);
k –key value in a particular main file record,
a - pointer address of the main file record
•
Main file is not sorted
•
Index is sorted by a key
79
5.5.2017.
Željko Knok
5.2.3 Index file
b. Index-direct file organisation
80
5.5.2017.
Željko Knok
5.2.4 B-tree
•
For better quality search of larger main files today’s DBMSs use
a B-tree
•
B-tree order m is m-ary tree with the following properties:
• The root is either a leaf or has at least two children
• Each node, except the root and leaves, has between m/2
and m children
• All paths from the root to a leaf are of the same length
Let a B-tree contain in its leaves n pairs of the form (k, a).
Let one leaf contain on average b pairs (k, a).
The number of block readings needed for the key-based access
is proportional to the tree height, which is at least
~logm/2(n/b). In practice m and b can be large, so the B-tree
usually has only a few (3-4) levels.
Conclusion: B-tree index access ~ equal to the speed of
hashed file access
81
5.5.2017.
Željko Knok
5.2.4 B-tree
Inserting value 23 into B-tree
82
5.5.2017.
Željko Knok
6 IMPLEMENTATION OF
RELATIONAL OPERATIONS
83
5.5.2017.
Željko Knok
6.1 Implementation of natural join
We have been talking about a “static” aspect of the physical
database structure
Relational database is based on the “dynamic” aspect, i.e.
approximation of relational algebra expressions.
Within a relational DBMS, algebraic expression is approximated, and
its basic step is approximation of a single algebraic operation.
We will be talking about the implementation of three most important
operations:
1. Natural join
2. Selection and projection
3. Optimal approximation of the entire expression
84
5.5.2017.
Željko Knok
6.1 Implementation of Natural Join
We are going to observe relations R1(A,B) and R2(B,C) with the
common attribute B.
Let’s mark the natural join R1 and R2 with S(A,B,C).
Each of these three relations is physically shown by one file of the
same name.
We are going to consider a few ways to generate file S by files R1
and R2.
85
5.5.2017.
Željko Knok
6.1.1 Algorithm of Nested Node
It is the most obvious, although not necessarily the most
effective way. The idea is:
Initiate empty S ;
Load the first record from R1;
while (have not reached the end of R1) {
load the first record from R2;
while (have not reached the end of R2) {
if (current record from R1 and R2 contain the same
value for B)
create a composite record and write it into S;
try to load another record from R2;
}
try to load another record from R1;
}
86
5.5.2017.
Željko Knok
6.1.2 Algorithm based on sortin
and compression
Suppose that files R1 and R2 are sorted in ascending order by the
joint datum B. File S, which represents the, natural join from R1 and
R2 can be generated by the following algorithm;
Initialise empty S;
Load the first set of records from R1;
Load the first set of records from R2;
while ( have not reached the end of R1 nor R2) {
if (current set of records from R1 contains lower value for B
than the current set of records from R2)
try loading another set of records from R1;
otherwise if (current set of records from R2 contains lower
value for B than the current set of records from R1)
try loading the next set of records from R2;
otherwise {
combine each record from the current set of records from R
with each record from the current set of records from R2 and write
all generated records into S;
try loading the next set of records from R1;
try loading the next set of records from R2;
}
}
87
5.5.2017.
Željko Knok
6.1.2 Algorithm based on sortin
and compression
Conclusion:
If R1 and R2 are not sorted at the very beginning, first they
need to be sorted and then the natural join can be
calculated.
For smaller files that fit into random access memory it is not
a problem, however, if it is a large file they need to be
divided into segments.
Sorting the initial R1 and R2 will last considerably longer
than generating S from the already sorted R1 and R2.
This procedure is efficient if R1 and R2 are very big.
88
5.5.2017.
Željko Knok
6.1.3 Index based algorithm
Suppose that one of the files R1 and R2, e.g. R2, has the
secondary index on the common data B.
Then, file S which contains the natural join can be generated in
the following way:
Initialise empty S;
Load the first record from R1;
while (have not reached the end of R1) {
use the index to find and load all records from R2 which
have the same value for B as the current record from R1;
combine the current record from R1 with each of the loaded
records from R2 and write the generated records into S;
try loading another record from R1;
}
89
5.5.2017.
Željko Knok
6.1.3 Index based algorithm
Conclusion:
The entire R1 is read by the algorithm once. But
directly from R2 only those records that participate in
the natural join can be read.
That can lead to a significant savings in the scope of
work.
If both R1 and R2 have an index for B, then smaller
file should be read sequentially and use the index of
the bigger one.
90
5.5.2017.
Željko Knok
6.1.4 Algorithm Based on Hash
Function and Classification
Hash function h which depends on the common data B is given.
The combination of the given record from file R1 with the given
record from file R2 appears in the natural join S if and only if
both given records have the same value for B.
Therefore, hash function for both such records give the same
value.
By classifying the records from R1 and R2 into groups of those
with similar value h, it will be easier to determine the pairs
which may be combined.
91
5.5.2017.
Željko Knok
6.1.4 Algorithm based on hash
function and classification
Suppose R1 is smaller than R2. The algorithm consists of 5 steps as
follows:
1. Initialise an empty file S. Select hash function h. Divide the total
scope of hash values on k similar intervals. k is selected in such a
way that 1/k from file R1 fits into main memory.
2. Read sequentially R1 and classify its records into k groups, so that
one group contains all records from R1 which are copied by h into
one interval
3. Read sequentially R2 and classify its records into k groups, similar to
what was done with file R1.
4. Select one of the intervals for value h. Load the corresponding recor
group R1 into main memory. Read sequentially the corresponding
record group from R2. Combine the current record from R2 with all
records from R1 which have the same value for B. Write the obtained
values into S.
5. Repeat step 4, and choose a new interval for value h.
92
5.5.2017.
Željko Knok
6.1.5 Natural join implementation
Conclusion
Comparison of the four presented methods for the
implementation of the natural join;
•If the files R1 and R2 are already sorted by the common
datum B, then, the most efficient algorithm is based on
sorting and compression
•If one of the files is small enough to fit into main memory,
the nested node algorithm should be selected
•If one file is considerably bigger than another one and has
the corresponding index, the best algorithm is the index
based one.
•For large files R1 and R2 without an index, the best
algorithm is the one based on hash function and
classification.
93
5.5.2017.
Željko Knok
6.2 Implementation of selection,
projection and other operations
Apart from the natural join, the most important relational
operations are
selection and projection.
In this part we will present the main ideas for
implementation of these two operations, and then we will briefly
mention
how other operations are implemented.
94
5.5.2017.
Željko Knok
6.2.1 Implementation of selection
Relation R and Boolean condition β are given.
R is physically shown by the file of the same name, in a
standard way.
Implementation of the selection R where β depends on the
type of condition β, but usually it refers to searching records in
file R with given value for some data.
So, it is usually the approach based on the primary key or
approach based on another data.
95
5.5.2017.
Željko Knok
6.2.2 Implementation of projection
Relation R and its attribute A are given. R is physically shown with
the file of the same name in a standard way. In order to generate
the file that corresponds to S=R[A], obviously the whole file R
should be read and all values of A that appear selected. The same
value for A may appear more than once. The basic problem of
projection implementation is how to eliminate duplicate records in
S?
The simplest algorithm for projection implementation of a
projection is based on nested loops. The outer loop reads file R,
and the inner loop passes through the momentarily created part of
file S.
If R is large, the algorithm with nested loops requires too much
time.
Then it is better to select all values for A which appear in R and
then sort the sequence of selected values. Duplicates are selected
by one sequential reading.
96
5.5.2017.
Željko Knok
6.2.3 Implementation of other
operations
Carthesian product of two relations R1 and R2 is implemented
by a nested loop.
The union of two relations R1 and R2 is implemented in an
obvious way, at which, similar to projection, duplicate records
should be eliminated.
The cross-section of two relations can be understood as a
special case of a natural join where all attributes are common.
Other relational operators can be expressed using the already
presented ones, therefore they are not usually implemented
individually.
97
5.5.2017.
Željko Knok
6.3 Optimal approximation of
algebraic expressions
Approximation of expressions comes down to approximation of each
basic operation. These are usually operations of natural join,
selection, projection and perhaps some others.
For each basic DBMS operation there are several algorithms.
By using the parametres such as: cardinality of the relationship, size
of the main memory, existence or non-existence of certain indices,
similar to DBMS for the given operation, each of the available
algorithms assesses the time required for the operation to be
performed with this algorithm.
Assessments are based on the embedded heuristic rules. DBMS
then selects the algorithm with the minimum time assessed for the
given operation.
Relational DBMS represents the example of an expert system, so,
it is the software which has the characteristics of artificial
intelligence.
98
5.5.2017.
Željko Knok
7 DATA INTEGRITY AND DATA
SECURITY
99
5.5.2017.
Željko Knok
7.1 Integrity preservation
The term relates to preserving the data accuracy and consistency
Integrity can be easily disrupted:
by incorrect input of careless
users, incorrect work of
application programmes
The integrity of the base itself is taken care by :DBMS which
allows the database
designer to define the
so-called constraints
10
0
5.5.2017.
Željko Knok
7.1.1 Domain constraints
- expressed through the fact that the attribute value should belong
to the given domain.
The request that the primary attribute value must not be empty
also belongs to the field of domain integrity protection.
E.g.
In the relation STUDENT there is the attribute DOB, with the
following constraints;
It is an integer between 10 and 60.
since the list of types is limited the closest type would be
SMALLINT(in such a way the value 12.5 is prevented, -5 is
possible).
Some DBMS with the instruction CREATE allow more precise
conditions and they allow the definition of more precise conditions
(we declare DOB type SMALLINT, with an additional condition;
(DOB>=10) and (DOB<=60)
10
1
5.5.2017.
Željko Knok
7.1.2 Relationship constraint
Accurate relations between attributes within a relationship
should be kept ( e.g. functional dependence).
The most important example of such a constraint is the
requirement that two tuples within the same relation should
not have the same key value.
E.g.
CREATE TABLE STUDENT
(S_ID INTEGER NOT NULL,
S_NAME CHAR(20)
S_DOB SMALLINT,
PRIMARY KEY (S_ID));
10
2
5.5.2017.
Željko Knok
7.1.3 Preserving referential integrit
constraints
Accuracy and consistency of connections among relationships is
kept.
These are the constraints which refer to a foreign key, i.e. to an
attribute in one relations, which is at the same time the primary
key In another relation.
New SQL standards anticipate the clause FOREIGN KEY in the
command CREATE, by which a certain constraint is ordered.
CREATE TABLE REPORT
S_ID INTEGER NOT NULL,
P_ID INTEGER NOT NULL,
I_MARK SMALLINT,
PRIMARY KEY (S_ID,P_ID),
FOREIGN KEY(S_ID) REFERENCES STUDENT,
FOREIGN KEY(P_ID) REFERENCES PROGRAM);
10
3
5.5.2017.
Željko Knok
7.1.4 Integrity preservationConclusion
There are different constraints in today’s DBMSs and it is
not possible to include all possible constraints.
On the other hand, each constraint represents a burden
during data update.
We should not overdo with constraints.
10
4
5.5.2017.
Željko Knok
7.2 Simultaneous approach
DBMS should enable the users simultaneous data access –
multi-user work.
It is usually a virtual simultaneousness (time sharing of the
same computer).
In such a case DBMS must coordinate conflict actions very
carefully.
Each user must have an impression that he is the only one
using the database.
10
5
5.5.2017.
Željko Knok
7.2.1 Transactions
A user works with the database via - transactions.
A transaction brings the base from one consistent state
into another state (individual operations within the
transaction might be inconsistent).
Database integrity – the transaction must be fully
completed or it must not be performed at all.
In a multi-user base – several transactions are executed
simultaneously.
106
5.5.2017.
Željko Knok
7.2.2 Serialisability
Basic operations which belong to different transactions will be
intermingled in time.
The effect of simultaneous execution of transactions which is
equivalent to some concurrent execution is called
serialisability.
E.g.
Two travellers arrive at two different airline agencies at the
same time and want to buy an airline ticket for the same flight.
At that moment there is only one vacant seat.
( According to this, two passengers would be sitting on the
same seat)
107
5.5.2017.
Željko Knok
7.2.3 Locks and locking
Locks are ancillary data which coordinate conflict actions.
The base is divided into more parts, and one lock fits one part.
The transaction which wants to access a datum must first „take”
the corresponding lock and locks the respective part of the base
with it. As soon as the operation is executed, the transaction
must „return” the lock and in such a way unlock the data.
When the transaction comes across the data which are already
locked, it does not need to wait to be unlocked by the previous
transaction.
The size of the part of the base that the lock protects defines the
locking granularity.
Using the lock hides a possible danger – possibility of mutual
blocking between two or more transactions (deadlock)
108
5.5.2017.
Željko Knok
7.2.4 Two-phase locking
protocol
If in every transaction all lockings happen before
the first unlocking, then, arbitrary parallel execution of these
transactions must be serialisable –
two-phase locking protocol
E.g.
Two passengers want to switch places on a plane.
109
5.5.2017.
Željko Knok
7.2.5 Timestamps
Two-phase locking protocol is based on locks. But, there are
methods which do not use locks. This technique is based on
timestamps.
A transaction identification number, so called time stamp, is
assigned to each transaction.
Reading and change of the same datum is allowed only if they
are executed in the order corresponding to the order of the
transaction timestamps.
If the order is disrupted, one of the transactions must be
stopped, neutralised and started again with a larger timestamp.
E.g.
If T1 has stamp t1, T2 has stamp t2 > t1, T1 wants to read datum
x, and T2 has already changed the same x, then T1 must be
neutralised.
110
5.5.2017.
Željko Knok
7.3 Recovery
During its work, database migh find itself in „incorrect” condition due
to the following
• Transaction aborted (statement ROLLBACK WORK, power
failure…)
• Incorrect work of the transaction itself
•DBMS or operating system errors
• Hardware error or physical computer damage
DBMS is expected to enable „base recovery” in all mentioned cases.
Apart from the base itself, DBMS must keep some ancillary
services....
Which are these services?
111
5.5.2017.
Željko Knok
7.3 Recovery
Apart form the base itself, DBMS must keep some ancillary
services, which are :
1. Backup copy
2. Journal file, log file
Other different types of recovery are possible, such as:
• Neutralisation – interrupted or wrong transactions
• Re-establishing the base after its considerable damage
112
5.5.2017.
Željko Knok
7.3.1 Back-up database copy
The entire base is stored on another medium (magnetic
tape or another disk), it is done when the base is in a
consistent state.
While copying, the transactions that change data must not
be executed.
Because it is a long-duration operation, it is executed
when the database users are not present.
It is executed in periods defined in advance .
113
5.5.2017.
Željko Knok
7.3.2 Journal file
It is a “historic” file - every transaction from the last backup
database copy is recorded.
For one transaction the journal records:
• Transaction identifier,
• Address of each datum changed by the transaction,
together with the previous data value and the new value.
• Checkpoints: start, commit and rollback.
114
5.5.2017.
Željko Knok
7.3.3 Transaction neutralisation
It is a frequently executed operation, performed by DBMS
automatically.
The data changed by the transaction are given their previous
values.
The procedure is called roll-back:
• The journal is read and the old data value, which was
changed by the transaction, is found
• These old values are written again in the adequate place
in the base.
What happens if some other transaction reads the value of the
transaction which has just been neutralised?
115
5.5.2017.
Željko Knok
7.3.3 Transaction neutralisation
Obviously, this second transaction should be also neutralised.
The procedure itself would be rather complex, because it
includes the coordination of more transactions carried out
simultaneously.
If DBMS executes only one transaction at a time
then the transaction neutralisation is executed in two phases
by deferred write technique.
If this technique is used, data changes are not written into the
base immediately, but only after the transaction is entered in
the journal delivery point.
transaction
changes
journal
116
5.5.2017.
commit
base
Željko Knok
7.3.4 Re-establishing the base
An extraordinary and comprehensive operation, executed by
a base administrator.
It is called roll-forward and refers to re-entry of all data.
The procedure is the following:
• Establish the database state recorded by the last backup
copy
• Define the final (specific) control point in the journal
• Read the part of the journal from the beginning up to the last
control point
• Reload the changes in the database for each delivered
transaction from the observed part of the journal
After that, the database state which corresponds to the final
control point will be established.
117
5.5.2017.
Željko Knok
7.4 Protection against
unauthorised access
It refers to the part of software embedded in DBMS
which takes care of data protection.
This is used for restricting the database access to
authorized users
It consists of the following:
• User identification
• Views as protection mechanisms
• Authorisations
118
5.5.2017.
Željko Knok
7.4.1 User identification
A username and a password is assigned to each database
user.
A user must introduce himself/herself to DBMS with his/her
name and prove his/her identity with a password.
DBMS has the list of usernames and corresponding
passwords.
Protection is based on password confidentiality.
119
5.5.2017.
Željko Knok
7.4.2 View as protection
mechanism
Views are means for achieving logical independence (subschema).
A view serves as protection, because it enables the individual
user to see only a part of the data stored in a database.
In SQL the CREATE VIEW command is used to define a
view
and the execution from the global relationships with the
SELECT command nested in CREATE VIEW command.
Further in the presentation find the examples how to protect
from the global schema and corresponding views the
protection against unauthorised users is performed.
120
5.5.2017.
Željko Knok
7.4.2 View as protection
mechanism
1. Example
EMPLOYEES(Z_ID,Z_NAME,Z_SALARY,Z_ADDRESS,Z_DEPA
RTMENT)
DEPARTMENT(O_ID,O_NAME,O_ID_HEAD);
The views will be described with SQL command CREATE VIEW.
The view which can be used by a user to access the employee
data, but not their salaries:
CREATE VIEW EMPL1
AS SELECT Z_ID,Z_NAME,Z_ADDRESS,Z_DEPARTMENT
FROM EMPLOYEES;
121
5.5.2017.
Željko Knok
7.4.2 View as protection
mechanism
2. Example
The view which can be used by a user to access data about
employees, but only those employed in department No. 3;
CREATE VIEW EMPL2
AS SELECT *
FROM EMPLOYEES
WHERE O_ID=3;
122
5.5.2017.
Željko Knok
7.4.3 Authorisation
Using a view a user can see the data, but cannot work with
them.
Apart from the views, DBMS is connected to a user with their
authorities.
Most frequent authorities are as follows:
SELECT- authorisation for getting and retrieval data from the
given relationships (views);
INSERT- authorisation to insert new tuples into given relationship
(view);
123
5.5.2017.
Željko Knok
7.4.3 Authorisation
DELETE – authorisation to delete tuples in the given relationship
(view);
UPDATE - authorisation to make changes in the given
relationship (view);
ALTER - authorisation to change the structure of the given
relationship (adding new attributes)
CONNECT – authorisation for a user to be allowed to register for
working with the base;
DBA - authorisation which gives a user the status of a base
administrator.
124
5.5.2017.
Željko Knok
8 EXAMPLES – RELATIONAL
ALGEBRA AND RELATIONAL
CALCULUS
12
5
5.5.2017.
Željko Knok
8.1 Relational algaebra
operations
Relational algaebra – querry language in RBP
It consists of: relational operations which, based on one or
more relationships, calculate a new relationship.
Usual relations are:
•
•
•
•
•
•
•
•
Union (union set)
Intersect (cross section set)
Minus (set difference)
Where (selection)
Times (Carthesian product)
Join (natural connection)
Divideby (division)
[ ] (projection)
126
5.5.2017.
Željko Knok
8.1.1 Gourmand queries
We are observing the database of gourmands,dishes,
restaurants, whose ER-schema is shown in the diagram
below
GOURMAND
LOVES
VISITS
RESTAURANT
SERVES
Gourmand database ERschema
127
5.5.2017.
Željko Knok
DISH
8.1.1 Gourmand queries
Hypothesis: only entity names, without any additional data are
stored in the database. As a result, the relational schema would
look as follows:
VISITS (GOURMAND, RESTAURANT),
SERVES (RESTAURANT, DISH),
LOVES (GOURMAND,DISH).
Task: Write the following queries for the given gourmand
database in the relational algebra!
128
5.5.2017.
Željko Knok
8.1.1 Gourmand queries
Query 1: Make a list of all restaurants that serve a dish that
gourmand Joe likes.
R:
JOE’S_DISH:= (LOVES where GOURMAND = “Joe”) [DISH];
JOE’S_REST:= (JOE’S_DISH join SERVES) [RESTAURANT].
Query 2: Make a list of all gourmands who visit at least one
restaurant that serves a dish they like!
R:
Result := ((LOVES join SERVES) join VISITS) [GOURMAND].
129
5.5.2017.
Željko Knok
8.1.1 Gourmand queries
Query 3:
Make a list of all restaurants that serve all the dishes Joe
likes!
R:
Suppose the divideby operation is not available.
JOE’S_DISH := (LOVES where GOURMAND = “Joe”)
[DISH];
ALL_REST := SERVES [RESTAURANT];
ALL_COM := JOE’S_DISH times ALL_REST;
NOT_JOE’S := (ALL_COM minus SERVES)
[RESTAURANT]
JOE’S_REST := ALL_REST minus NOT_JOE’S.
130
5.5.2017.
Željko Knok
8.1.2 Library queries
We are looking at the library database. There are
relationships about books, members and borrowing books.
The relational schema looks as follows:
BOOK (CAT_ID, C_TITLE, C_AUTHOR, C_PUBLISHER),
MEMBER (MEMBER_ID, CL_NAME, CL_ADDRESS),
LOAN (LOA_CAT_ID, LOA_MEMBER_ID, LOA_DATE)
Write the queries in relational algebra based on this relational
schema;
131
5.5.2017.
Željko Knok
8.1.2 Library queries
Query 1: Find the titles and authors of all books published by
“Prentice_Hall”
R:
RESULT:= (BOOK where PUBLISHER = “Prentice_Hall”) [C_TITLE,
C_AUTHOR]
Query 2:
Find the titles of all books that were borrowed on
15, February 2010
R:
TEMP := (LOAN where LOA_DATE = “15-FEB-2010”);
RESULT:= (BOOK join TEMP) [C_TITLE]
132
5.5.2017.
Željko Knok
8.1.2 Library queries
Query 3:
Write the name and author of any book published by “Prentice_Hall” ,
which was lent to the member named „Ivan Ivić” before 21, July 2007
R:
TEMP1 := (MEMBER where ME_NAME= “Ivan Ivić”) [MEMBER_ID];
TEMP2 := ((TEMP1 join LOAN) where LOA_DATE < “21-JUL-2007”)
[CAT_ID];
RESULT:= (TEMP2 join BOOK) [C_TITLE, C_AUTHOR]
Query 4: Write all titles and authors of all books that have never been
lent.
R:
TEMP := BOOK [CAT_ID] minus LOAN [LOA_CAT_ID];
RESULT:= (TEMP join BOOK) [C_TITLE, C_AUTHOR].
133
5.5.2017.
Željko Knok
8.1.3 Equality evidence
Often, the same query can be written in different algebraic
expressions.
The proof that two relational algebra expressions are the
same is conducted in the following way:
• Any tuple that belongs to the value of the first
expression must belong to the value of the second tuple.
• Then, any tuple that belongs to the value of the second
expression must belong to the value of the first
expression.
134
5.5.2017.
Željko Knok
8.2 Use of relational calculus
Relational calculus is another query language in RBP. It is based on the
mathematical logic notation – predicate calculus. There are two
versions:
• Tuple-oriented version – variables represent complete tuples
• Domain-oriented version – variables represent the values of
individual attributes
The query is formed in a way to write a predicate which a tuple i.e.
attributes must satisfy.
Compared to the relational algebra, the relational calculus is mostly
“nonprocedural”, it is just the definition of the result which we want to
obtain, without the definition of the procedure for obtaining it.
135
5.5.2017.
Željko Knok
9 LITERATURE
Compulsory literature:
1. Ž. Knok, skripta, MEV, Čakovec, 2010.
2. M. Radovan: Baza podataka, Informator, Zagreb, 1993.
Additional literature:
1. S. Tkalac: Relacijski model podataka, Informator, Zagreb, 1988.
2. D.J. Ullman: Database and Knowledge - base Systems, Computer Science Press,
1999.
13
6
5.5.2017.
Željko Knok