Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server wikipedia , lookup
Serializability wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Versant Object Database wikipedia , lookup
Clusterpoint wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Databases Contents by topics: 1 Introduction 2 3 4 5 6 7 8 9 Data modelling Relational database languages SQL language for relational databases Physical database structure Implementation of relational operations Data integrity and safety Examples of relational calculus and relational algebra References 1 INTRODUCTION Database definition Database is a set of related data (Ramirez Elmasri, Shamkent B. Navathe, Fundamentals of Database Systems, Addison-Wesly) 2 5.5.2017. Željko Knok 1.1 Data storage mediums Non-electronic mediums • Punched paper cards • Punched paper tapes Electronic mediums • HD (disks) • Tapes • CD • DVD • USB 3 5.5.2017. Željko Knok 1.2 Data organisation All data are stored in files, which may be: Classical files (sequential, relative, index…) Files organised in databases • • • database presents one aspect (abstraction) of the real part of the world. Changes in the part of the world reflect in a database database is designed, built and filled with specific-purpose data database is a set of mutually connected data stored in external computer memory 4 5.5.2017. Željko Knok 1.3 Database architecture Physical level – display and arrangement of data on the external memory units Global logical level – logical structure of the entire base, it is an aspect seen by a database designer and administrator. Record of the logical definition is called a shema. Local logical level – refers to the logical image of the part of the database used by a specific application. The record of one local logical definition is called a view. 5 5.5.2017. Željko Knok 1.4 Database Management System DBMS – Database Management System is a set of programs which enables a user to create and maintain a database (database server) DBMS is a general-purpose software system which: • Creates a physical presentation of a database in accordance with the required logical structure • Performs all data operations • Takes care of data safety • Automates database administration tasks 6 5.5.2017. Željko Knok 1.5 Data model Represents the set of rules which defines the database logical structure and is the basis for: • drafting • designing • database implementation 7 5.5.2017. Željko Knok 1.6 Database models • Relational model Data and links between data are shown by “rectangular” tables • Object model Inspired by object-oriented programming languages, the base is a set of permanently stored objects, which consist of internal data and “methods” (operations) for manipulating the data. Each object belongs to a class. Between the classes are established inheritance relationships, i.e. the operations are mutually used 8 5.5.2017. Željko Knok 1.7 Objectives • Physical data independence • Logical data independence • Flexible data approach • Simultaneous data access • Integrity protection • Possibility of recovery after failure • Protection of unauthorised use • Satisfactory access speed • Possibility of adjustment and control 9 5.5.2017. Željko Knok 1.8 Database example • The database containing the University data on: students • courses • teachers • exams • study program… The data in a database are organised as data sets with the same properties - entity sets 10 5.5.2017. Željko Knok 1.8 Database example • For each data set basic elements should be defined E.g. Student (Student course book, Name, Surname, Date of birth, study programme...) - attributes. For each basic data it is necessary to define the data type E.g. Name is a string(20), Student course book is an integer, Date of birth is date… 11 5.5.2017. Željko Knok 1.8 Database example Entity attributes Example of student data - structure Database scheme Student course book Name Surname Date of birth Study programme 23401 Marko Marić 11.12.1985 Računalstvo 23402 Ana Anić 07.06.1990 Menadžment 23403 Iva Ivić 12.11.1991 Menadžment Primary key Description of entity - tuple 12 5.5.2017. Željko Knok 1.9 Database compared to file organisation • Properties of file organisation • Files contain only data. Data are described (data structure) in the program • Information on mutual relationships between the data are not stored in files • Specific data properties are not stored in files • There is no possibility of storing undefined data 13 5.5.2017. Željko Knok 1.9 Database compared to file organisation • Database properties •Databases contain data and data description • Databases keep data about relationships between data • Databases store properties of individual values • Undefined data can be used • Access control is enabled • Update control is enabled 14 5.5.2017. Željko Knok 1.10 Database users • DB administrator – administers the database • DB designer – designs the database • Database end users • occasssional • unexperienced • sophisticated users • Behind the scene users • DBMS system designers and implementators • Tool creators • Operators and maintenance men 15 5.5.2017. Željko Knok 2 DATA MODELLING How to create a database schema, harmonised with the relational model rules? 16 5.5.2017. Željko Knok 2.1 Entity – relationship modelling • represents the conceptual schema, as abstraction of the real world • In the entity-relationship modelling the world is observed through three categories; • • • Entity: objects or events of our interest Relationship: relationship between the entities of our interest Attributes: entity and relationship properties of our interest 17 5.5.2017. Željko Knok 2.2 Entities and attributes Entity is any object in the real world: • Object • Event • Phenomenon • Attributes describe an entity • e.g. attributes of a house are: address, number of floors, facade colour… • some attributes can have their own attributes, such an attribute should be considered a new entity (e.g. car model) Entity name, together with the associated attributes defines the entity type Candidate key is an attribute or set of attributes whose values uniquely define an example of the entity type. • 18 5.5.2017. Željko Knok 2.3 Relationships Relationships are established between two or more entity types. (e.g. relationship PLAY_FOR between the entity types PLAYER and TEAM) Relationships represent the binary or k-ary relation between the examples of entity types. • RELATIONSHIP FUNCTIONALITY CAN BE: 1. One-to-one (1:1) e.g. relationship IS_HEAD between the entity types TEACHER and DEPARTMENT (college department) 2. One-to-many (1:N) e.g. relationship TEACH between the entity types TEACHER and COURSE 3. Many-to-many (M:N) e.g. relationship ENROLLED between the entity types STUDENT and COURSE 19 5.5.2017. Željko Knok 2.4 Complex relationships In real situations appear the relationships which are more complex than those previously mentioned: • Involuted relationships One entity type is related to the same entity type • Sub-types Entity type E1 is sub-type of entity type E2 if every E1 example is also example of E2 • Ternary relationships Relationship between three entity types is established e.g. companies, products they manufacture and countries to which they export their products 20 5.5.2017. Željko Knok 2.5 E-R schema diagram ER-schema is usually shown as a diagram in which the rectangles present entity types and rhombi present relationships 1 DEPARTMENT 1 1 offers N is_head N COURSE 1 teaches N enrolled M STUDENT 21 5.5.2017. Željko Knok is_in 1 TEACHER N 2.6 The role of E-R schema • ER model is simple enough to be used by people of different professions. • ER schema serves for communication between the database designer and the user in the earliest stage of database development • The existing DBMS cannot directly implement the ER schema, therefore it should be further elaborated. 22 5.5.2017. Željko Knok 2.7 Relational model • RELATIONAL MODEL Data and relationships between the data are shown by “rectangular” tables. In mid 80s of the 20th century relational model prevailed and today most DBMSs use that model A database consists of a set of rectangular tables, called Relations. Each relation has its name, by which it can be differentiated from the others in the same base. Relation columns represent attributes, and attribute values are the same data type. A line is called a tuple. A relation cannot contain two same tuples. The number of attributes is the degree of a relation, and the number of tuples is relation cardinality. 23 5.5.2017. Željko Knok 2.8 Example of relationship CAR CAR REG_NUMBER MANUFACTURER MODEL YEAR CD234 Ford Fiesta 1997 XC294 Nissan Primera 1998 AU930 Ford Escort 2002 PD402 Fiat Punto 2008 VE838 Volkswagen Golf 2005 24 5.5.2017. Željko Knok 2.8 Example of relationship CAR Key REG_NUMBER of the relationship CAR is the subset of attributes of CAR which has the following properties: 1. Attribute values of REG_NUMBER uniquely define the tuple in CAR. So, it is not possible for CAR to contain two tuples with same attribute values from REG_NUMBER. 2. If any attribute from REG_NUMBER is removed, property 1 is disrupted. 25 5.5.2017. Željko Knok 2.9 Primary key One of the attributes is called a primary key. The attributes that make the primary key are called primary attributes. The primary key attribute cannot have a null value in any tuple. Relationship structure is briefly described with the so called relational schema, which consists of the relation name and a list of attribute names in parentheses. Primary attributes are underlined. For example, CAR relational schema looks like this: CAR ( REG_NUMBER, MANUFACTURER, MODEL, YEAR) 26 5.5.2017. Željko Knok 2.10 Translation of E-R schema into relational model 1. Translation of entity types – each entity type is presented by one relation. Type attributes become relational attributes. Primary entity key becomes primary relational key. Entity STUDENT is shown by the relation STUDENT ( NO_STUDENT COURSE BOOK, NAME_STUDENT, ADDRESS, GENDER, …) 2. Translation of binary connections If entity type E2 has obligatory membership in (N:1) relation with type E1, then E2 relation should include E1 attributes COURSE (NO_COURSE, NAME_DEPARTMENT, NAME, SEMESTER, …) The key of one relation which is copied into another relation in that relation is called a foreign key 3. Translation of involuted relationships Entity type PERSON (1:1) of the connection MARRIED_TO is best shown with two relations; PERSON(PIN, SURNAME_NAME, ADDRESS, …) MARRIAGE(PIN_HUSBAND, PIN_WIFE, DATE_MARRIAGE) 27 5.5.2017. Željko Knok 2.10 Translation of E-R schema into relational model 4. Translation of entity subtypes Subtype is presented with a relationship which contains primary attributes of superior type and the attributes specific for this subtype. For example, hierarchy of types is shown by the relations PERSON(PIN, …attributes common to all types of persons…) STUDENT(PIN, … attributes specific for students …) TEACHER(PIN, … attribues specific for teachers …) LECTURER(PIN, … attributes specific for lecturers …) 5. Translation of ternary links Ternary link is shown with a relation which contains primary attributes of all three entity types together with eventual connection attributes. COMPANY (NAME_COMPANY, …) PRODUCT (NAME_PRODUCT, …) COUNTRY (NAME_COUNTRY, …) EXPORTS(NAME_COMPANY, NAME_PRODUCT, NAME_COUNTRY) 28 5.5.2017. Željko Knok 2.11 Relational model normalisation • Relational schema, obtained from the ER-schema based on the previous requests, can contain imperfections which must be removed before the implementation. • The process by which the existing schema is modified is called normalisation • Normalisation is based on the concept of normal forms • They are divided into normal forms: first normal form, second normal form.. and are marked as 1NF, 2NF… 29 5.5.2017. Željko Knok 2.12 First Normal Form • Conditions 1NF 1. Connection between data on logical, not physical level (address on the disk) 2. For each entity type there is one primary key 3. Each field within an entity has a common name which does not repeat The first two conditions are mandatory for relational databases, whereas the third condition is not satisfied for the following example ( because the field županija repeats) Entity: Country (ID_COUNTRY,NAME;CAPITAL_CITY,COUNTY1,CO UNTY2,..) 30 5.5.2017. Željko Knok 2.13 Second Normal Form • Conditions 2NF 1. Conditions of 1NF must be satisfied 2. Each field within an entity which is not a part of the primary key functionaly depends on the entire (composite) key Second normal form mostly applies to primary keys composed of two or more fields (composite primary key). Entity: STUDENT STUDENT(CODE_FACULTY, NO_STUDENT COURSE BOOK, NAME_STUDENT,…) 31 5.5.2017. Željko Knok 2.14 Third Normal Form • Conditions 3NF 1. Conditions of 2NF must be satisfied 2. There is no transitive dependency of any field on any key Transitivity: if a R b and b R c, a R c follows. Entity: County COUNTY(COUNTY, COUNTRY, CAPITAL_CITY_COUNTRY,…) Field CAPITAL_CITY_COUNTRY depends on the field COUNTRY, and the field on the COUNTRY field COUNTY. Relation is not in 3rd normal form so it should be divided into two relations: COUNTY, COUNTRY, i.e. COUNTRY, CAPITAL_CITY_COUNTRY. 32 5.5.2017. Željko Knok 2.15 Normal forms consequences • There are higher level normal forms • Normal forms are a set of rules useful for modelling general database cases • Complex data types sometimes can require compromises: deviation from an ideal solution • Such complex data types very often occur in large and complex databases. • Result: object-relational databases 33 5.5.2017. Željko Knok 3 LANGUAGES FOR RELATIONAL DATABASES 34 5.5.2017. Željko Knok 3.1 Relational algebra - performing algebraic expressions, built from relations and unary and binary operators - each algebraic expression represents one query (search/browse) - a simplified version of the University database will serve as an example: STUDENT (st_student course book, st_name, st_year), COURSE (co_id, co_name, co_tea) REPORT (iz_id, iz_co_id, iz_result) TEACHER (na_name, na_room) 35 5.5.2017. Željko Knok 3.1 Relational algebra STUDENT REPORT ST_COUR SE BOOK ST_NA ME ST_YEAR 876543 Jones 2 864532 Burns 1 856434 Cairns 3 876421 Hughes 2 COURSE RE_ID RE_CO_ID RE_RESULT 876543 216 82 864532 216 75 864532 312 71 856434 121 49 876421 312 39 876543 251 70 CO_ID CO_NAME CO_TEA 864532 251 69 216 Baze podataka Black 864532 121 78 312 Programiranje Welsh 251 Numerička mat Quinn 121 PAUP Holt 36 5.5.2017. TEACHER TEA_N AME TEA_ROO M Black 1017 Welsh 1024 Holt 2014 Quinn 1010 Željko Knok 3.1.1 Set operators Relations are sets of tuples. Therefore, set operators can be applied to them. Let R and S define relations. As a result: R union S ... Set of tuples which are in R or in S ( or in both relations) R intersect S ... Set of tuples which are in R and also in S R minus S ... Set of tuples which are in R but not in S In order to be able to apply the operators, the relations R and S must be compatible (they must have the same level and same attributes – names and types) Notice that the following is always valid: R intersect S = R minus (R minus S) from the above mentioned it can be concluded that...? 37 5.5.2017. Željko Knok 3.1.1 Set operators Example: observe the relation NEW_STUDENT STUDENT union NEW_STUDENT NEW_STUDENT ST_COUR SE BOOK ST_NA ME ST_YEAR 876542 Smith 3 865698 Turner 2 875923 Murphy 2 856434 Cairns 3 871290 Noble 1 STUDENT minus NEW_STUDENT ST_COUR SE BOOK ST_NA ME ST_YEAR 876543 Jones 2 864532 Burns 1 876421 Hughes 2 38 ST_COUR SE BOOK ST_NA ME ST_YEAR 876543 Jones 2 864532 Burns 1 856434 Cairns 3 876421 Hughes 2 876542 Smith 3 865698 Turner 2 875923 Murphy 2 871290 Noble 1 STUDENT intersect NEW_STUDEN 5.5.2017. ST_COUR SE BOOK ST_NA ME ST_YEAR 856434 Cairns 3 Željko Knok 3.1.2 Selection Selection is a unary operator which selects those tuples from the relation which satisfy the given Boolean conditions. Selection on relation R in line with the Boolean condition β is marked with R where β. Condition β is an equation consisting of : • Operands which are either constants or attributes • Comparison operators =,<, >, ≤, ≥, ≠, • Logical operators and, or, not. 39 5.5.2017. Željko Knok 3.1.3 Projection Projection is a unary operator which selects given attributes from the relation, with duplicate tuples eliminated from the resulting relation. Examples: Find the room numbers of all lecturers LEC_ROO M 1017 1024 2014 1010 Find the name of the lecturer who teaches course 312 40 5.5.2017. Željko Knok CO_LEC Welsh 3.1.4 Cartesian product If R and S are relations of levels n1 and n2, then, algebraic expression R times S gives the Cartesian product of R and S. Example: 1. List all the courses not enrolled in for each student! ALL_ENROLLED:=STUDENT(ST_COURSE BOOK) times COURSE(CO_ID), NOT_ENROLLED:= ALL_ENROLLED minus REPORT(RE_ID,RE_CO_ID) 41 5.5.2017. Željko Knok 3.1.4 Cartesian product Example: 2. Find all pairs of students in the same year TEMP aliases STUDENT, PAR:= ( ( TEMP times STUDENT) where ( (TEMP.ST_YEAR= STUDENT.ST_YEAR) and (TEMP.ST_COURSE BOOK < STUDENT.ST_COURSE BOOK) ) ) [TEMP.ST_NAME, STUDENT.ST_NAME] 42 5.5.2017. Željko Knok 3.1.5 Natural join Natural join is a binary operator applicable to two relations R and S, which have at least one common attribute. R join S consists of all tuples obtained by joining one tuple from R with one tuple from S, which have the same common attribute values. Examples: 1. Names of all students enrolled in course 251! QUERY1 := ( REPORT where RE_CO_ID=251) join STUDENT) [ST_NAME] 2. Find the room number of the lecturer who teaches course 312! QUERY2 := (( COURSE where CO_ID=) 312 join TEACHER) [ROOM NUMBER]. 43 5.5.2017. Željko Knok 3.1.6 Other set operators 1. Theta-joint – represents the combination of the Cartesian product and selection 2. Division – marked as divideby 3. Outer joint – marked as outerjoin is used for searching the data which do not satisfy a certain condition Practice: Find the names of the students who did not enrol in any course! 44 5.5.2017. Željko Knok 3.2 Relational calculus The query includes a predicate which has to be satisfied by tuples. There are two types : 1. Tuple-oriented calculus (where tuples are basic objects) 2. Domain-oriented calculus (where attribute domains are basic objects) 45 5.5.2017. Željko Knok 3.3 SQL Language The query is requested by a flexible command SELECT. The result of the query is considered a new temporary relationship, derived from the permanent ones. SQL language structure; SELECT atributtes FROM relation WHERE condition; For entry, change and deletion of data, the following commands are used; INSERT, UPDATE and DELETE 46 5.5.2017. Željko Knok 3.3 SQL Language Examples: QUERY1: Find the numbers and names of all students on level 1 QUERY2: Find the numbers and names of the students who enrolled in course 121. QUERY3: Find all pairs of numbers of students which refer to same year of study QUERY4: Find all data about the students who did not enrol in course 121. Remark: To answer these tasks the relationships mentioned at the beginning of this unit should be followed. 47 5.5.2017. Željko Knok 4 SQL LANGUAGE FOR RELATIONAL DATABASES 48 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.1 Type of data When defining a relationship, an attribute type has to be specified. Attribute may be; date, number, name, text, internet computer number, logical value such as truth/lie and so on. These are some of more important and more often used attribute types: INT Integer – usually 4 bytes, although it depends on DBMS. BIGINT Integer stored in 8 bytes. SMALLINT Integer stored in 2 bytes. REAL Decimal number stored in 4 bytes NUMERIC (p,s) Arbitrary-precision decimal number 49 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.1 Type of data – cont. BOOLEAN Truth/Lie (TRUE/FALSE, t/n, y/n, 1/0, ...) CHAR(n) String – a sequence of letters/numbers/characters of fixed length n. VARCHAR(n) String – a sequence of letters/numbers/characters of maximum length n. TEXT String of arbitrary length (MEMO) DATE Date, e.g. 2002-10-21 TIME Time, e.g. 04:05:06 TIMESTAMP Date+Time (1999-01-08 04:05:06) 50 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.2 Definition of relationship Relationship is defined using SQL command CREATE TABLE, followed by the relationship name and the list of attribute definitions in parentheses, separated by commas. Attributes are defined by the attribute name, followed by the specification of attribute types and other attribute properties. Example of definining the relation student: CREATE TABLE STUDENT( surname VARCHAR(50), name VARCHAR(50), index VARCHAR(10), year INT, module VARCHAR(10), PRIMARY KEY (index) ); Primary list which defines a relation key is created at the end of the relationship. 51 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.2 Definition of relationship The second example defines the relationship which contains course data: CREATE TABLE KOLEGIJ ( name VARCHAR(50), surname VARCHAR(50), kid INT, name VARCHAR(100), hours1 INT, exercises1 INT, hours2 INT, exercises2 INT, PRIMARY KEY(kid) ); Relationship is deleted from the database with the command DROP TABLE <name_relation>. 52 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.3 Data input New data are entered in the relationship with SQL command INSERT INTO, followed by the relationship name and optionally the list of attribute names in parentheses, then follows the word VALUES, and the list of attribute values in parentheses. Let’s have a look at the example of inserting students into the relation student: INSERT INTO STUDENT VALUES ('ANTOLIĆ','ANITA','F-1961',2,'pfi'); INSERT INTO STUDENT VALUES ('ANTOLKOVIĆ','VLATKA','F1761',2,'pfi'); INSERT INTO STUDENT VALUES ('BABIĆ','GORDAN','F2523',1,'pfi');.... The list of attribute names does not have to be specified if the subsequent list of values follows the order of relationship definition. Text values are limited by the quotation mark unlike numeric values. If the value for a certain attribute is not stated, the attribute acquires the predefined value, which is specified when the relation is defined, or NULL value (empty). 53 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.4 Update/change of data Data are updated with SQL command UPDATE. Example: Problem: Change the name of prof. K.Furić who teaches the course No. 2362 into "Krešimir"! UPDATE course SET name='Krešimir' WHERE kid=2362;. After the word SET comes the list of attributes, which are updated separated by commas. 4.1.5 Deleting data Data are deleted with the command DELETE FROM. Example: Problem: Delete the elective course „History of Informatics" (2404) from the PFI study programme! DELETE FROM study_pfi WHERE kid=2404; 54 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.6 Transactions A transaction begins with SQL command BEGIN, then follows the sequence of SQL instructions by which the data are changed or browsed. This sequence of instructions should finish with the SQL command COMMIT or to delete the previous sequence of commands with the command ROLLBACK. BEGIN TRANSACTION; UPDATE STUDENT SET student course book='F-3342' WHERE student course book ='F-3343'; UPDATE LECTURE SET student course book ='F-3342' WHERE student course book ='F-3343'; ROLLBACK TRANSACTION; or BEGIN TRANSACTION; DELETE FROM STUDENT WHERE student course book ='F-3343'; INSERT INTO STUDENT VALUES ('Nikić','Nikša','F-3342',4,'pfi'), UPDATE LECTURE SET student course book ='F-3342' WHERE student course book ='F-3343'; COMMIT TRANSACTION; 55 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.7 Queries The query is defined with the command SELECT. In order to get differen tuples the command SELECT DISTINCT should be used. Example of a simple query Problem: Find student course book numbers and names of all first-year students! SELECT student course book, name, surname FROM STUDENT WHERE year=1 ORDER BY surname,name; Student course book | name | surnamee ---------+--------------+ ------------F-2523 | GORDAN | BABIC F-2506 | KRESIMIR | BACIC F-2271 | DAMIR | BAKMAZ F-2143 | TIBOR | BALI F-2144 | IVAN | BEDNJANEC F-2356 | BRUNO | BLAZINIC ..... F-2561 | DEJAN | ZIKOVIC (69 rows) 56 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.7 Queries Problem: Find student course book numbers and names of the students who signed up for the course No. 224! SELECT student.name,student.surname,student.student course book FROM STUDENT, LECTURE WHERE student.student course book=lecture.student course book AND lecture.kid=1224 ORDER BY student.surname; name | surname | student course book ---------- +-------------+-------AMIR | EL-OCH | F-2025 IVAN | GLADOVIC | F-1823 MARIO | KLOKOCKI | F-1851 MARIN | KOSOVIC | F-1830 VEDRAN | KRALJ | F-1972 ... ZRINKA | SUMANOVAC| F-1789 KRESIMIR| VURNEK | F-2023(17 rows) 57 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.7 Query The required could be obtained in a different way, by using subqueries SELECT ime,prezime,indeks FROM STUDENT WHERE student course book IN (SELECT STUDENT COURSE BOOK FROM lecture WHERE kid=1224) ORDER BY surname; IN means belonging to a set. So, it is a nested SELECT command SELECT within another SELECT. There is another alternative, SELECT name,surname,student course book FROM STUDENT NATURAL JOIN LECTURE WHERE kid=1224 ORDER BY student.surname; using natural connection (join) of two relations. 58 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.7 Query Problem: Find student course book numbers and names of the student who signed up for at least one course lectured by professor Androic! Similar to the previous example there are more alternatives, one using the Carthesian product of three relations and another with nested SELECT commands: SELECT student.name,student.surname,student,student course book FROM STUDENT,LECTURE,COURSE WHERE student.student course book=lecture.student course book AND lecture.kid=course.kid AND course.surname='Androic'; OR SELECT name,surname, student course number FROM STUDENT WHERE student course number IN ( SELECT student course number FROM LECTURE WHERE kid IN (SELECT kid FROM KOLEG WHERE surname ='Androic') 59 5.5.2017. Željko Knok 4.1 Introduction to SQL Language 4.1.7 Query Problem: Find all pairs of course book numbers of the students in the same year! SELECT t1.indeks,t2.indeks FROM STUDENT t1,STUDENT t2 WHERE t1.student course number < t2.student course number AND t1.year=t2.year; Here we have introduced aliases or so called other names, t1 and t2, for the relation STUDENT. The use of an alias is needed because it is the Carthesian product of the relation STUDENT with itself, when there is a possibility of confusion with attribute names. In front of the alias the word AS can be added. 60 5.5.2017. Željko Knok 5 PHYSICAL DATABASE STRUCTURE 61 5.5.2017. Željko Knok 5.1 Physical structure elements Data are stored on magnetic disks. As for the physical database structure one should know what • record • file • pointers are, which represents a very low level of abstraction very close to reality. 62 5.5.2017. Željko Knok 5.1 Physical structure elements University data file 63 5.5.2017. Željko Knok 5.1.1 External computer memory •OS divides the external memory into blocks of permanent size e.g. 512 bytes or 4096 bytes • Each block is unanimously defined by its address • Basic external memory operation is transfer of a block with a given address from external memory to main memory and vice versa • The part of main memory which participates in the transfer is buf • Block is the smallest amount of data which can be transferred. • Time needed for transfer is not a constant value, it depends on the position of a disk head • Time for manipulation with external memory – in ms • Time for manipulation with main memory – in ns 64 5.5.2017. Željko Knok 5.1.1 External computer memory File in blocks with related records 65 5.5.2017. Željko Knok 5.1.2 Files • At this level it is a final sequence of records of the same type stored in external memory • Record type is defined as a tuple of basic data (described by name and type) • Records are of fixed length (one record has specific attribute value, which is shown by a fixed number of bytes). Typical operations performed on files are: - insert a new record - modify a record - remove a record - find a record or records for which the given data have given values 66 5.5.2017. Željko Knok 5.1.3 External memory files • A record is usually smaller than a block (more records in a block) • Record address is structured as an ordered pair (block address, shift within a block) • At some file organisations, some spaces in a block may be left empty • How to distinguish full and empty spaces? • Enlarge the record with one bit, which denotes whether the space is „empty” or “full” • It is sometimes necessary to “delete” a record, but to show that its place is still taken one more bit in the record is needed, which shows that the record is “valid” or “not valid” • The entire file usually takes more blocks • Because of the input and deletion of records, the external memory is fragmented 67 5.5.2017. Željko Knok 5.1.4 Pointers • Pointer is a value within a record which points to another record ( in the same or another file) • Pointer can be: • Record address (“physical”) • Primary key value (“logical”) • Enables to establish communication among records • Pointer-address enables fast access • Pointer-key is “slow” – implicitly defines a record which should be found • Presence of a pointer-address may cause problems with file reorganisation or update. • If a pointer-address points to a record, the record is „pinned”. • The presence of a pointer-key does not cause that kind of trouble, but regardless the “slowness” these pointers are used 68 5.5.2017. Željko Knok 5.1.5 Physical base structure • The entire base is structured as a set of files • Base records can be mutually connected with pointers • All database operations are the operations on database files • If it is a relational database, then every relationship represents one file • Relational attributes correspond to basic record data • One relational tupple is shown by one record • Primary key of a relational database defines the primary file key • Physical relational database structure can also contain additional ancillary files which make the search and connections among the data faster. Example of such files are indices. 69 5.5.2017. Željko Knok 5.2 File access based on the primary key • A very important operation with files is access based on the primary key. • The address of a record (not more than one) that contains the given value of the primary key should be defined • Later in the presentation the ways of file realisation based on the primary key and corresponding data oganisation. 70 5.5.2017. Željko Knok 5.2.1 Simple file • absence of any kind of structure • Data records are put in as many blocks as needed • Blocks that form a file may be connected: 1. in a linked list (every block contains the address of the following block) 2. in the address table of all blocks (which occupies the first or first few blocks) • Searching for the record with a given key value requires reading the entire file • A new record is placed in the first free space in the first unfilled block • Records can be modified without any limitations 71 5.5.2017. Željko Knok 5.2.1 Simple file 72 5.5.2017. Željko Knok 5.2.2 Hash file • File records are put into P boxes, marked with numbers 0, 1, 2, …, P-1. • Each box consists of one or more blocks • The given hash function h gives the number of h(k) box, where the record with key value k should be located. • A set of possible key values is usually much larger than the numbe of boxes • h should distribute key values to the boxes in a uniform manner 73 5.5.2017. Željko Knok 5.2.2 Hash file • Example of a good hash function: • Key values are seen as bit sequences of a fixed length; • Given bit sequence is divided into fixed-length sets; zeros are added to the last set if needed; • Bit sets are added like integers; • The sum is divided by the number of boxes; • The reminder after division is the number of the required box. 74 5.5.2017. Željko Knok 5.2.2 Hash file 75 5.5.2017. Željko Knok 5.2.3 Index file • Index is a small ancillary file which facilitates searching in a large (main) file. Two variants of index files will be presented: a. Index sequential file organisation b. Index-direct file organisation 76 5.5.2017. Željko Knok 5.2.3 Index file a. Index sequential file organisation • Records in the main file should be sorted by key values • Blocks do not have to be completely filled • So called Dilution index is added • Each index record corresponds to one block of the main file • Record form is (k,a); k – the smallest key value in the particular block a – block address 77 5.5.2017. Željko Knok 5.2.3 Index file a. Index sequential file organisation 78 5.5.2017. Željko Knok 5.2.3 Index file b. Index-direct file organisation • Allows for records in the main file to be sorted at random • So called Dense index is added • Each index record corresponds to one block of the main file • Record form is (k,a); k –key value in a particular main file record, a - pointer address of the main file record • Main file is not sorted • Index is sorted by a key 79 5.5.2017. Željko Knok 5.2.3 Index file b. Index-direct file organisation 80 5.5.2017. Željko Knok 5.2.4 B-tree • For better quality search of larger main files today’s DBMSs use a B-tree • B-tree order m is m-ary tree with the following properties: • The root is either a leaf or has at least two children • Each node, except the root and leaves, has between m/2 and m children • All paths from the root to a leaf are of the same length Let a B-tree contain in its leaves n pairs of the form (k, a). Let one leaf contain on average b pairs (k, a). The number of block readings needed for the key-based access is proportional to the tree height, which is at least ~logm/2(n/b). In practice m and b can be large, so the B-tree usually has only a few (3-4) levels. Conclusion: B-tree index access ~ equal to the speed of hashed file access 81 5.5.2017. Željko Knok 5.2.4 B-tree Inserting value 23 into B-tree 82 5.5.2017. Željko Knok 6 IMPLEMENTATION OF RELATIONAL OPERATIONS 83 5.5.2017. Željko Knok 6.1 Implementation of natural join We have been talking about a “static” aspect of the physical database structure Relational database is based on the “dynamic” aspect, i.e. approximation of relational algebra expressions. Within a relational DBMS, algebraic expression is approximated, and its basic step is approximation of a single algebraic operation. We will be talking about the implementation of three most important operations: 1. Natural join 2. Selection and projection 3. Optimal approximation of the entire expression 84 5.5.2017. Željko Knok 6.1 Implementation of Natural Join We are going to observe relations R1(A,B) and R2(B,C) with the common attribute B. Let’s mark the natural join R1 and R2 with S(A,B,C). Each of these three relations is physically shown by one file of the same name. We are going to consider a few ways to generate file S by files R1 and R2. 85 5.5.2017. Željko Knok 6.1.1 Algorithm of Nested Node It is the most obvious, although not necessarily the most effective way. The idea is: Initiate empty S ; Load the first record from R1; while (have not reached the end of R1) { load the first record from R2; while (have not reached the end of R2) { if (current record from R1 and R2 contain the same value for B) create a composite record and write it into S; try to load another record from R2; } try to load another record from R1; } 86 5.5.2017. Željko Knok 6.1.2 Algorithm based on sortin and compression Suppose that files R1 and R2 are sorted in ascending order by the joint datum B. File S, which represents the, natural join from R1 and R2 can be generated by the following algorithm; Initialise empty S; Load the first set of records from R1; Load the first set of records from R2; while ( have not reached the end of R1 nor R2) { if (current set of records from R1 contains lower value for B than the current set of records from R2) try loading another set of records from R1; otherwise if (current set of records from R2 contains lower value for B than the current set of records from R1) try loading the next set of records from R2; otherwise { combine each record from the current set of records from R with each record from the current set of records from R2 and write all generated records into S; try loading the next set of records from R1; try loading the next set of records from R2; } } 87 5.5.2017. Željko Knok 6.1.2 Algorithm based on sortin and compression Conclusion: If R1 and R2 are not sorted at the very beginning, first they need to be sorted and then the natural join can be calculated. For smaller files that fit into random access memory it is not a problem, however, if it is a large file they need to be divided into segments. Sorting the initial R1 and R2 will last considerably longer than generating S from the already sorted R1 and R2. This procedure is efficient if R1 and R2 are very big. 88 5.5.2017. Željko Knok 6.1.3 Index based algorithm Suppose that one of the files R1 and R2, e.g. R2, has the secondary index on the common data B. Then, file S which contains the natural join can be generated in the following way: Initialise empty S; Load the first record from R1; while (have not reached the end of R1) { use the index to find and load all records from R2 which have the same value for B as the current record from R1; combine the current record from R1 with each of the loaded records from R2 and write the generated records into S; try loading another record from R1; } 89 5.5.2017. Željko Knok 6.1.3 Index based algorithm Conclusion: The entire R1 is read by the algorithm once. But directly from R2 only those records that participate in the natural join can be read. That can lead to a significant savings in the scope of work. If both R1 and R2 have an index for B, then smaller file should be read sequentially and use the index of the bigger one. 90 5.5.2017. Željko Knok 6.1.4 Algorithm Based on Hash Function and Classification Hash function h which depends on the common data B is given. The combination of the given record from file R1 with the given record from file R2 appears in the natural join S if and only if both given records have the same value for B. Therefore, hash function for both such records give the same value. By classifying the records from R1 and R2 into groups of those with similar value h, it will be easier to determine the pairs which may be combined. 91 5.5.2017. Željko Knok 6.1.4 Algorithm based on hash function and classification Suppose R1 is smaller than R2. The algorithm consists of 5 steps as follows: 1. Initialise an empty file S. Select hash function h. Divide the total scope of hash values on k similar intervals. k is selected in such a way that 1/k from file R1 fits into main memory. 2. Read sequentially R1 and classify its records into k groups, so that one group contains all records from R1 which are copied by h into one interval 3. Read sequentially R2 and classify its records into k groups, similar to what was done with file R1. 4. Select one of the intervals for value h. Load the corresponding recor group R1 into main memory. Read sequentially the corresponding record group from R2. Combine the current record from R2 with all records from R1 which have the same value for B. Write the obtained values into S. 5. Repeat step 4, and choose a new interval for value h. 92 5.5.2017. Željko Knok 6.1.5 Natural join implementation Conclusion Comparison of the four presented methods for the implementation of the natural join; •If the files R1 and R2 are already sorted by the common datum B, then, the most efficient algorithm is based on sorting and compression •If one of the files is small enough to fit into main memory, the nested node algorithm should be selected •If one file is considerably bigger than another one and has the corresponding index, the best algorithm is the index based one. •For large files R1 and R2 without an index, the best algorithm is the one based on hash function and classification. 93 5.5.2017. Željko Knok 6.2 Implementation of selection, projection and other operations Apart from the natural join, the most important relational operations are selection and projection. In this part we will present the main ideas for implementation of these two operations, and then we will briefly mention how other operations are implemented. 94 5.5.2017. Željko Knok 6.2.1 Implementation of selection Relation R and Boolean condition β are given. R is physically shown by the file of the same name, in a standard way. Implementation of the selection R where β depends on the type of condition β, but usually it refers to searching records in file R with given value for some data. So, it is usually the approach based on the primary key or approach based on another data. 95 5.5.2017. Željko Knok 6.2.2 Implementation of projection Relation R and its attribute A are given. R is physically shown with the file of the same name in a standard way. In order to generate the file that corresponds to S=R[A], obviously the whole file R should be read and all values of A that appear selected. The same value for A may appear more than once. The basic problem of projection implementation is how to eliminate duplicate records in S? The simplest algorithm for projection implementation of a projection is based on nested loops. The outer loop reads file R, and the inner loop passes through the momentarily created part of file S. If R is large, the algorithm with nested loops requires too much time. Then it is better to select all values for A which appear in R and then sort the sequence of selected values. Duplicates are selected by one sequential reading. 96 5.5.2017. Željko Knok 6.2.3 Implementation of other operations Carthesian product of two relations R1 and R2 is implemented by a nested loop. The union of two relations R1 and R2 is implemented in an obvious way, at which, similar to projection, duplicate records should be eliminated. The cross-section of two relations can be understood as a special case of a natural join where all attributes are common. Other relational operators can be expressed using the already presented ones, therefore they are not usually implemented individually. 97 5.5.2017. Željko Knok 6.3 Optimal approximation of algebraic expressions Approximation of expressions comes down to approximation of each basic operation. These are usually operations of natural join, selection, projection and perhaps some others. For each basic DBMS operation there are several algorithms. By using the parametres such as: cardinality of the relationship, size of the main memory, existence or non-existence of certain indices, similar to DBMS for the given operation, each of the available algorithms assesses the time required for the operation to be performed with this algorithm. Assessments are based on the embedded heuristic rules. DBMS then selects the algorithm with the minimum time assessed for the given operation. Relational DBMS represents the example of an expert system, so, it is the software which has the characteristics of artificial intelligence. 98 5.5.2017. Željko Knok 7 DATA INTEGRITY AND DATA SECURITY 99 5.5.2017. Željko Knok 7.1 Integrity preservation The term relates to preserving the data accuracy and consistency Integrity can be easily disrupted: by incorrect input of careless users, incorrect work of application programmes The integrity of the base itself is taken care by :DBMS which allows the database designer to define the so-called constraints 10 0 5.5.2017. Željko Knok 7.1.1 Domain constraints - expressed through the fact that the attribute value should belong to the given domain. The request that the primary attribute value must not be empty also belongs to the field of domain integrity protection. E.g. In the relation STUDENT there is the attribute DOB, with the following constraints; It is an integer between 10 and 60. since the list of types is limited the closest type would be SMALLINT(in such a way the value 12.5 is prevented, -5 is possible). Some DBMS with the instruction CREATE allow more precise conditions and they allow the definition of more precise conditions (we declare DOB type SMALLINT, with an additional condition; (DOB>=10) and (DOB<=60) 10 1 5.5.2017. Željko Knok 7.1.2 Relationship constraint Accurate relations between attributes within a relationship should be kept ( e.g. functional dependence). The most important example of such a constraint is the requirement that two tuples within the same relation should not have the same key value. E.g. CREATE TABLE STUDENT (S_ID INTEGER NOT NULL, S_NAME CHAR(20) S_DOB SMALLINT, PRIMARY KEY (S_ID)); 10 2 5.5.2017. Željko Knok 7.1.3 Preserving referential integrit constraints Accuracy and consistency of connections among relationships is kept. These are the constraints which refer to a foreign key, i.e. to an attribute in one relations, which is at the same time the primary key In another relation. New SQL standards anticipate the clause FOREIGN KEY in the command CREATE, by which a certain constraint is ordered. CREATE TABLE REPORT S_ID INTEGER NOT NULL, P_ID INTEGER NOT NULL, I_MARK SMALLINT, PRIMARY KEY (S_ID,P_ID), FOREIGN KEY(S_ID) REFERENCES STUDENT, FOREIGN KEY(P_ID) REFERENCES PROGRAM); 10 3 5.5.2017. Željko Knok 7.1.4 Integrity preservationConclusion There are different constraints in today’s DBMSs and it is not possible to include all possible constraints. On the other hand, each constraint represents a burden during data update. We should not overdo with constraints. 10 4 5.5.2017. Željko Knok 7.2 Simultaneous approach DBMS should enable the users simultaneous data access – multi-user work. It is usually a virtual simultaneousness (time sharing of the same computer). In such a case DBMS must coordinate conflict actions very carefully. Each user must have an impression that he is the only one using the database. 10 5 5.5.2017. Željko Knok 7.2.1 Transactions A user works with the database via - transactions. A transaction brings the base from one consistent state into another state (individual operations within the transaction might be inconsistent). Database integrity – the transaction must be fully completed or it must not be performed at all. In a multi-user base – several transactions are executed simultaneously. 106 5.5.2017. Željko Knok 7.2.2 Serialisability Basic operations which belong to different transactions will be intermingled in time. The effect of simultaneous execution of transactions which is equivalent to some concurrent execution is called serialisability. E.g. Two travellers arrive at two different airline agencies at the same time and want to buy an airline ticket for the same flight. At that moment there is only one vacant seat. ( According to this, two passengers would be sitting on the same seat) 107 5.5.2017. Željko Knok 7.2.3 Locks and locking Locks are ancillary data which coordinate conflict actions. The base is divided into more parts, and one lock fits one part. The transaction which wants to access a datum must first „take” the corresponding lock and locks the respective part of the base with it. As soon as the operation is executed, the transaction must „return” the lock and in such a way unlock the data. When the transaction comes across the data which are already locked, it does not need to wait to be unlocked by the previous transaction. The size of the part of the base that the lock protects defines the locking granularity. Using the lock hides a possible danger – possibility of mutual blocking between two or more transactions (deadlock) 108 5.5.2017. Željko Knok 7.2.4 Two-phase locking protocol If in every transaction all lockings happen before the first unlocking, then, arbitrary parallel execution of these transactions must be serialisable – two-phase locking protocol E.g. Two passengers want to switch places on a plane. 109 5.5.2017. Željko Knok 7.2.5 Timestamps Two-phase locking protocol is based on locks. But, there are methods which do not use locks. This technique is based on timestamps. A transaction identification number, so called time stamp, is assigned to each transaction. Reading and change of the same datum is allowed only if they are executed in the order corresponding to the order of the transaction timestamps. If the order is disrupted, one of the transactions must be stopped, neutralised and started again with a larger timestamp. E.g. If T1 has stamp t1, T2 has stamp t2 > t1, T1 wants to read datum x, and T2 has already changed the same x, then T1 must be neutralised. 110 5.5.2017. Željko Knok 7.3 Recovery During its work, database migh find itself in „incorrect” condition due to the following • Transaction aborted (statement ROLLBACK WORK, power failure…) • Incorrect work of the transaction itself •DBMS or operating system errors • Hardware error or physical computer damage DBMS is expected to enable „base recovery” in all mentioned cases. Apart from the base itself, DBMS must keep some ancillary services.... Which are these services? 111 5.5.2017. Željko Knok 7.3 Recovery Apart form the base itself, DBMS must keep some ancillary services, which are : 1. Backup copy 2. Journal file, log file Other different types of recovery are possible, such as: • Neutralisation – interrupted or wrong transactions • Re-establishing the base after its considerable damage 112 5.5.2017. Željko Knok 7.3.1 Back-up database copy The entire base is stored on another medium (magnetic tape or another disk), it is done when the base is in a consistent state. While copying, the transactions that change data must not be executed. Because it is a long-duration operation, it is executed when the database users are not present. It is executed in periods defined in advance . 113 5.5.2017. Željko Knok 7.3.2 Journal file It is a “historic” file - every transaction from the last backup database copy is recorded. For one transaction the journal records: • Transaction identifier, • Address of each datum changed by the transaction, together with the previous data value and the new value. • Checkpoints: start, commit and rollback. 114 5.5.2017. Željko Knok 7.3.3 Transaction neutralisation It is a frequently executed operation, performed by DBMS automatically. The data changed by the transaction are given their previous values. The procedure is called roll-back: • The journal is read and the old data value, which was changed by the transaction, is found • These old values are written again in the adequate place in the base. What happens if some other transaction reads the value of the transaction which has just been neutralised? 115 5.5.2017. Željko Knok 7.3.3 Transaction neutralisation Obviously, this second transaction should be also neutralised. The procedure itself would be rather complex, because it includes the coordination of more transactions carried out simultaneously. If DBMS executes only one transaction at a time then the transaction neutralisation is executed in two phases by deferred write technique. If this technique is used, data changes are not written into the base immediately, but only after the transaction is entered in the journal delivery point. transaction changes journal 116 5.5.2017. commit base Željko Knok 7.3.4 Re-establishing the base An extraordinary and comprehensive operation, executed by a base administrator. It is called roll-forward and refers to re-entry of all data. The procedure is the following: • Establish the database state recorded by the last backup copy • Define the final (specific) control point in the journal • Read the part of the journal from the beginning up to the last control point • Reload the changes in the database for each delivered transaction from the observed part of the journal After that, the database state which corresponds to the final control point will be established. 117 5.5.2017. Željko Knok 7.4 Protection against unauthorised access It refers to the part of software embedded in DBMS which takes care of data protection. This is used for restricting the database access to authorized users It consists of the following: • User identification • Views as protection mechanisms • Authorisations 118 5.5.2017. Željko Knok 7.4.1 User identification A username and a password is assigned to each database user. A user must introduce himself/herself to DBMS with his/her name and prove his/her identity with a password. DBMS has the list of usernames and corresponding passwords. Protection is based on password confidentiality. 119 5.5.2017. Željko Knok 7.4.2 View as protection mechanism Views are means for achieving logical independence (subschema). A view serves as protection, because it enables the individual user to see only a part of the data stored in a database. In SQL the CREATE VIEW command is used to define a view and the execution from the global relationships with the SELECT command nested in CREATE VIEW command. Further in the presentation find the examples how to protect from the global schema and corresponding views the protection against unauthorised users is performed. 120 5.5.2017. Željko Knok 7.4.2 View as protection mechanism 1. Example EMPLOYEES(Z_ID,Z_NAME,Z_SALARY,Z_ADDRESS,Z_DEPA RTMENT) DEPARTMENT(O_ID,O_NAME,O_ID_HEAD); The views will be described with SQL command CREATE VIEW. The view which can be used by a user to access the employee data, but not their salaries: CREATE VIEW EMPL1 AS SELECT Z_ID,Z_NAME,Z_ADDRESS,Z_DEPARTMENT FROM EMPLOYEES; 121 5.5.2017. Željko Knok 7.4.2 View as protection mechanism 2. Example The view which can be used by a user to access data about employees, but only those employed in department No. 3; CREATE VIEW EMPL2 AS SELECT * FROM EMPLOYEES WHERE O_ID=3; 122 5.5.2017. Željko Knok 7.4.3 Authorisation Using a view a user can see the data, but cannot work with them. Apart from the views, DBMS is connected to a user with their authorities. Most frequent authorities are as follows: SELECT- authorisation for getting and retrieval data from the given relationships (views); INSERT- authorisation to insert new tuples into given relationship (view); 123 5.5.2017. Željko Knok 7.4.3 Authorisation DELETE – authorisation to delete tuples in the given relationship (view); UPDATE - authorisation to make changes in the given relationship (view); ALTER - authorisation to change the structure of the given relationship (adding new attributes) CONNECT – authorisation for a user to be allowed to register for working with the base; DBA - authorisation which gives a user the status of a base administrator. 124 5.5.2017. Željko Knok 8 EXAMPLES – RELATIONAL ALGEBRA AND RELATIONAL CALCULUS 12 5 5.5.2017. Željko Knok 8.1 Relational algaebra operations Relational algaebra – querry language in RBP It consists of: relational operations which, based on one or more relationships, calculate a new relationship. Usual relations are: • • • • • • • • Union (union set) Intersect (cross section set) Minus (set difference) Where (selection) Times (Carthesian product) Join (natural connection) Divideby (division) [ ] (projection) 126 5.5.2017. Željko Knok 8.1.1 Gourmand queries We are observing the database of gourmands,dishes, restaurants, whose ER-schema is shown in the diagram below GOURMAND LOVES VISITS RESTAURANT SERVES Gourmand database ERschema 127 5.5.2017. Željko Knok DISH 8.1.1 Gourmand queries Hypothesis: only entity names, without any additional data are stored in the database. As a result, the relational schema would look as follows: VISITS (GOURMAND, RESTAURANT), SERVES (RESTAURANT, DISH), LOVES (GOURMAND,DISH). Task: Write the following queries for the given gourmand database in the relational algebra! 128 5.5.2017. Željko Knok 8.1.1 Gourmand queries Query 1: Make a list of all restaurants that serve a dish that gourmand Joe likes. R: JOE’S_DISH:= (LOVES where GOURMAND = “Joe”) [DISH]; JOE’S_REST:= (JOE’S_DISH join SERVES) [RESTAURANT]. Query 2: Make a list of all gourmands who visit at least one restaurant that serves a dish they like! R: Result := ((LOVES join SERVES) join VISITS) [GOURMAND]. 129 5.5.2017. Željko Knok 8.1.1 Gourmand queries Query 3: Make a list of all restaurants that serve all the dishes Joe likes! R: Suppose the divideby operation is not available. JOE’S_DISH := (LOVES where GOURMAND = “Joe”) [DISH]; ALL_REST := SERVES [RESTAURANT]; ALL_COM := JOE’S_DISH times ALL_REST; NOT_JOE’S := (ALL_COM minus SERVES) [RESTAURANT] JOE’S_REST := ALL_REST minus NOT_JOE’S. 130 5.5.2017. Željko Knok 8.1.2 Library queries We are looking at the library database. There are relationships about books, members and borrowing books. The relational schema looks as follows: BOOK (CAT_ID, C_TITLE, C_AUTHOR, C_PUBLISHER), MEMBER (MEMBER_ID, CL_NAME, CL_ADDRESS), LOAN (LOA_CAT_ID, LOA_MEMBER_ID, LOA_DATE) Write the queries in relational algebra based on this relational schema; 131 5.5.2017. Željko Knok 8.1.2 Library queries Query 1: Find the titles and authors of all books published by “Prentice_Hall” R: RESULT:= (BOOK where PUBLISHER = “Prentice_Hall”) [C_TITLE, C_AUTHOR] Query 2: Find the titles of all books that were borrowed on 15, February 2010 R: TEMP := (LOAN where LOA_DATE = “15-FEB-2010”); RESULT:= (BOOK join TEMP) [C_TITLE] 132 5.5.2017. Željko Knok 8.1.2 Library queries Query 3: Write the name and author of any book published by “Prentice_Hall” , which was lent to the member named „Ivan Ivić” before 21, July 2007 R: TEMP1 := (MEMBER where ME_NAME= “Ivan Ivić”) [MEMBER_ID]; TEMP2 := ((TEMP1 join LOAN) where LOA_DATE < “21-JUL-2007”) [CAT_ID]; RESULT:= (TEMP2 join BOOK) [C_TITLE, C_AUTHOR] Query 4: Write all titles and authors of all books that have never been lent. R: TEMP := BOOK [CAT_ID] minus LOAN [LOA_CAT_ID]; RESULT:= (TEMP join BOOK) [C_TITLE, C_AUTHOR]. 133 5.5.2017. Željko Knok 8.1.3 Equality evidence Often, the same query can be written in different algebraic expressions. The proof that two relational algebra expressions are the same is conducted in the following way: • Any tuple that belongs to the value of the first expression must belong to the value of the second tuple. • Then, any tuple that belongs to the value of the second expression must belong to the value of the first expression. 134 5.5.2017. Željko Knok 8.2 Use of relational calculus Relational calculus is another query language in RBP. It is based on the mathematical logic notation – predicate calculus. There are two versions: • Tuple-oriented version – variables represent complete tuples • Domain-oriented version – variables represent the values of individual attributes The query is formed in a way to write a predicate which a tuple i.e. attributes must satisfy. Compared to the relational algebra, the relational calculus is mostly “nonprocedural”, it is just the definition of the result which we want to obtain, without the definition of the procedure for obtaining it. 135 5.5.2017. Željko Knok 9 LITERATURE Compulsory literature: 1. Ž. Knok, skripta, MEV, Čakovec, 2010. 2. M. Radovan: Baza podataka, Informator, Zagreb, 1993. Additional literature: 1. S. Tkalac: Relacijski model podataka, Informator, Zagreb, 1988. 2. D.J. Ullman: Database and Knowledge - base Systems, Computer Science Press, 1999. 13 6 5.5.2017. Željko Knok