Download The Relational Data Model

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

SQL wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Ingres (database) wikipedia , lookup

Functional Database Model wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Join (SQL) wikipedia , lookup

Relational algebra wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Database model wikipedia , lookup

Relational model wikipedia , lookup

Transcript
1
Duty 5 Designing a Relational Database
The Basics of Relational Database
The relational system is most widely used system I PC. The relational database
model has become the de-facto standard for the design of databases both large
and small.
The relational database was developed by mathematician named Dr.E.F Codd
in the early 1970’s.
He used the concept of a relation in mathematics as the basis of a structured
method of storing and retrieving data. The background on which a relational
database was formed is as follows:
“A relation is a two-dimensional table, consisting of a collection of rows and
columns.
A series of these tables make up a database.
A table can be linked to a file in a conventional file based system, and each
column is like a field. A row then can be compared to a record in the file.”
In short, the relational database systems are database systems that use the
relational data model. In the following section the further detail of relational
model and its building blocks will be discussed. The relational system is a system
in which the data perceived by the user as table and the user uses certain operators
2
to work on the data. The relational model is divided in to three parts having to do
with objects, integrity and operators.
Relational Objects
The relational data objects are domains and relations.
Domains
In this model, the smallest semantic units of data are called scalars cannot be
decomposed (break part) further without loosing meaning. For example, a sex
field may have values F and M. But, the address column may be made to hold
Woreda, Kebele and House- number information. In this case, the address date
cannot be called a scalar as it can be further decomposed to three data units each
of which can represent a separate entity.
A domain is a pool of values from which specific attributes of specific relations
draw their actual values. A domain also can be defined as a set (The same type) of
scalar values. For example, Domain of cities (Addis Ababa, Jimma, Diredawa,
Lekempt, Assosa), Domain of courses (English, Amaharic, Database management,
Geography)
The domain concept is not supported by many RDBMS (Relational Database
Management System). Instead a built in data types used. Domains can be used for
refusing invalid entries and facilitating comparisons. The data definition for
domains takes the form :-
Create domain name definition domain
To change an existing domain definition we can use:
Alert domain Dname definition
Domain can be removed from the system using:
3
Destroy domain name
Relations
A relation is made up of attributes (columns) and tuples (rows). The term relation,
tuples and attribute are used as a substitute for the more formal terms table, row,
and column or table, record and field. In the discussion to follow for the sake of
simplicity, we will avoid the relational terms and in place of them use their
equivalents.
A table is composed of a predefined number of columns and changeable number
of rows. Each column in a table draws its values from a domain defined earlier.
The table can be view as a mathematical set of the table header and the table body
parts.
The table header consists of <column names: their respected domain names>.
Whereas the table body contains a set of pairs (<column name: value>). For
example, if we have the following table
Name
Age
Sex
Abebe
23
M
Kebede
45
M
Zahra
26
F
Ahmed
53
M
The table definition requires a definition of each column and its associated
domain. The domain for this table may be defined as:
Create domain dname character (20)
Create domain dage numeric(2)
Create domain dsex character(1)
And the table header definition will look loike :
{(name , dname), (age, dage), (sex,, dsex) }
the body part of the table , will look like
4
{(<name:’abebe’
>,<age:23>,<sex:’M’>),(<name:’kebede’>,<age,45>>,<sex:’M’>),(<name:’z
ahra’>,<age:26>,<sex:’f’>,(,name:’ahmed’)>,<age:53,<sex:’m’) }
the data definition for relations takes the form;
create base relation_name
(column_definition_commalist
candidate-key-
definition-list
5
Foreign-key-defnition_list
Properties of the relations
Relation or tables posses certain important features and these are:
1. There are no duplicate rows (topples) allowed in a table; in other words,
there must not be two identical rows in a table.
This is very important property of the relational model, for if duplicate
rows are allowed in a table, then there would be no way for a program to
uniquely reference a certain row in a table. Thus creating an inherent
problem for programming.
2. All value in a column are atomic or consist of scalar values or never a
collection of several values.
3. Rows in a table are not ordered, i.e,. Within a table there is no inherent
ordering of rows (top to bottom).
4. Columns in a table are not ordered i.e,. within a table there is no inherent
ordering of columns (left to right).
The last two properties mean that the operations in tables in the rational model
should not depend on a specific ordering of columns or rows. In short, all rows
and fields are equal in the sense that none of them has to exist in the context of
others and that none is higher or lower than others in the overall data structure.
Kinds of Relations (tables)
The following types of tables can be defined in a relational system.
Base relations/tables
These tables are created by some data definition language command. Data in base
tables do not come from any source internal to a database; instead, data must be
6
entered into base tables manually or through some batch data transfer process.
Data in these tables are stored “permanently” in relation to the database itself.
Views
A view is a named-derived table (relation) that is represented within the system
purely by its definition in terms of other base tables or views. Views can be
treated just like real tables.
Snapshots
A snapshot is a named-derived like a view but unlike a view it stores its own data
rather than the definition. Snapshots also Can be treated as a base table. The
snapshot can be taken as saved form of a result of some query that produces a
table.
Query result
This is the final output table resulting from the specified query. It may or may not
be saved or have persistent existence.
Temporary tables
Are tables, usually, created by the DBMS and destroyed by it at some appropriate
time. This can include intermediate tables that are created when some large
operation is underway and removed when the operation is finalized.
The Catalog
7
The catalog or data dictionary contains detail information regarding the various
objects in the relational database. These are the tables. Indexes, user information.
Data integrity and security rules. And so on. The catalog itself is made up of tables
that can be manipulated as any other table in the system. In most cases they might
be kept hidden by the DBMS with the possibility of Manu plating or working on
them.
Relational Data Integrity
Integrity rules are certain rules or checking mechanisms. When applied, guide the
system to prohibit the entry of invalid or an acceptable (to the case in
consideration) data or operations that would result in such types of data. The
integrity rules may be defined at the different level: column (filed) row (record)
level or table level or at the database level.
For example. Some of the rules needed of supplier-parts database system are
shown below,
 Supplier id numbers must be of the form, snnnn.
 Part numbers must be of the form, pnnnn.
 Part colors must be red, Green or white only.
Part weigh must be greater than zero.
The database definition needs to be extended to include certain rules, the purpose
of which is to inform the DBMS of certain constrains of the real world (such as
the constrain that part weight can not be negative). So that it can prevent such
undesired value from entering occurring in the system. The DBMS may need to
monitor all INSERT and UPDATE operation and reject any operation that
attempts to enter the valid entries (a negative weight).
The relational model provides two general integrity features that can applicable to
any relation database system. These are a) Candidate Keys and b) Foreign Keys.
8
Candidate Keys
Tables can contain multiple rows of data. And each row of a table in a relational
system must be uniquely identified by some column or combination in that table.
All columns (or combinations of columns) in a table with unique values are
referred to as Candidate keys. Among the candidate keys found in a table one can
be selected as a primary key of that table and all other candidate keys rather than
the primary key are referred to as alternate keys). Keys can be simple or
composite. A simple key is made up of one column. Whereas a composites key
is made up of to or more columns. If one or more columns of a composite key
satisfy the definition of a candidate key then that composite key will not be
considered as a candidate key.
In most DBMS. Indexes are used to implement candidate keys. Hence the unique
indexes found in such systems are net similar to candidate keys. Note that a
system that does not have a candidate key can display strange behaviors in some
circumstances.
The Entity Integrity Rule
Specifies that no component of a candidate key or the primary key is allowed to
have nulls need to have some values.
Choosing among the Candidate keys
Since there may be multiple candidate keys in a table you must mane a decision
as to which candidate key is to be the primary key. There is no general rule that
can be applied here you have to use your own judgment in many cases. Some
rules of thumb are:
Choose the column(s) least likely to change.
9
Choose as few columns as possible.
Choose columns that are familiar to users, if possible.
Foreign keys
The power of a relational database system lies in the fact that rows (or records) in
one table can be matched to records in other tables through the use of keys:
therefore. Primary keys would be largely useless if not used for cross-referencing
between tables. Primary keys are reference through foreign keys. A foreign key
is a column in a table used to reference a primary key in another table. Take, as
an example, the following tables that hold data about a company’s employees and
all departments in the company.
TABLE
employees
Emp -Id
Emp_name
Dep_id
Salary
A1
Abebe
D1
456.90
A2
Almaz
D2
600.00
B1
Belay
D1
677.00
A3
Ahmed
D2
600.00
G1
Genet
D3
500.00
Departments
Dep_id
Dep_name
Budget
D1
Administration
33456.90
D2
Planning
33600.00
D3
Sales
33677.00
D4
Purchase
66600.00
D5
Construction
56600.00
10
Note that the Dep-id column appears in both the employees and department’s
tables. In the departments table. The Dep-id is primary key; in the employees
table. However this field is used as a foreign key. You must make sure that both
foreign keys and their corresponding primary keys share a common meaning and
draw their values from the same domain. Any column including, the primary key,
can be a foreign key and can, also, be simple or composite. In the above example,
the department’s table is referred to as the referencing and the employee’s table is
referred to as the referenced/target table.
The Referential Integrity Rule
A long with the foreign key concept. The relational model includes the referential
integrity rule. The rule says the database must not contain any unmatched foreign
key values. The term ‘unmatched foreign key value’ here means a foreign key
value for which there does not exist a marching value of the relevant primary key
in the target table.
The cases to be considered are:
1. What should happen on attempt to delete the target the target of a foreign key
reference? For example. An attempt to delete a department for which there
exists at least one employee working. In general, there are two possibilities:
Restricted the diskette operation is restricted to the case where there are no
such match into records (reject otherwise).
Cascades the delete operation cascades to deleting all records with matching
values in the referencing table (employee)
2. What should happen on attempt to update a primary key that is the target of a
foreign key reference? For example. An attempt to update a department id for
which their exists at least one employee.
possibilities:
In general, there are two
11
Restricted the update operation is restricted to the case where there are no such
matching records (reject otherwise).
Cascades the update operation cascades to update the foreign key in those
matching records in the referencing table (employees).
For each foreign key in the design, the database designer should specify, not
only the columns that constitute the foreign key and the tables, but also the
foreign key rules to apply the situations occur.
12
Introductory Notes on Set Theory and Relation
13
14
 Mathematical relations:
– Cartesian product: Set1 x Set2
 Set1 contains 1,2,3
 Set2 contains 6,7,8
 The Cartesian product contains:
(1,6),(1,7),(1,8),(2,6),(2,7),(2,8),(3,6),(3,7),(3,8)
– A relation is any subset of this Cartesian product
Example: If
customer_name = {Jones, Smith, Curry, Lindsay}
customer_street = {Main, North, Park}
customer_city = {Harrison, Rye, Pittsfield}
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) }
is a relation over
customer_name x customer_street x customer_city
G
15
 Let S1 = {0,1}
 Let S2 = {a,b,c}
 Let R  S1 X S2
 Then for example: r(R) = {<0,a> , <0,b> , <1,c> }
is one possible “state” or “population” or “extension” r of
the relation R, defined over domains S1 and S2.
It has three tuples.
16
Data Models Relation
Relation
–
–
–
–
A relation may be thought of as a set of rows.
A relation may alternately be though of as a set of columns.
Each row represents a fact that corresponds to a real-world entity or
relationship.
Each row has a value of an item or set of items that uniquely identifies that
row in the table.
17
–
Sometimes row-ids or sequential numbers are assigned to identify the
rows in the table.
– Each column typically is called by its column name or column header or
attribute name.
Relation properties:
– A distinct name for each relation in the same relational schema
– 1 or more attributes, each with:
 a distinct name within the same relation
 all values from the same domain
 no significant order within a relation
 a domain constraint
 Attributes are unordered (Left to right)
 All attribute values are atomic
– 0..N tuples, with each tuple:
 not duplicated
 containing no more than 1 atomic value per cell (row-column
intersection) – i.e., 1st Normal Form
 in no significant order within the relation
•
The only structure available is a 2-dimensional file of data.
•
This is known as a relation or table.
•
Each entity corresponds to a table and each attribute to a column (or field) in
that table.
•
•
•
•
•
•
•
Each entity occurrence corresponds to a row of the table
Data is held in tables
There is no order of data in the tables - either in row
or attribute
Primary Key - Foreign Key relationship
Data Typing including NULLS
Query Access - insert, update, delete, retrieval
Indexing on Candidate (and Primary) keys
•
Generally Relation: made up of 2 parts:
18
•
•


Instance : a table, with rows and columns.
#Rows = cardinality, #fields = degree / arity.
Relation Schema R
• Used to describe a relation
• R(A1,...,An)
• R: relation name
• A1,...,An: a list of attributes
• Degree of relation, n: number of attributes
• i.e.) CAR(SerialNo, Make, Model, Year)
Relation State r(R) (= r)
• Relation Extension: current status of a relation R as a set of n tuples
• r(R) = t1,...,tm
• n-tuple t = v1,...,vn: ordered list of n values
Database Schema
– Set of relational schemas
– E.g. Product(Name, Price, Category, Manufacturer),
Vendor(Name, Address, Phone),
.......
Relational Database State of S
• defined as a set of relation states at a particular point in time
• DB = {r1, ..., rn}, where ri is a state of Ri
• ri relation state satisfy the integrity constraints
19
Relation Model has three part:the data structures;
the integrity constraints;
the data manipulation operators.
The data structures;
Data Structures - domain, attribute, relation, row (tuple), primary key, degree,
cardinality.
Integrity Constraints - entity integrity and referential integrity.
Data Manipulation Operations - defined through relational algebra and equivalent
relational calculus.
– Domain
– Domain of an Attribute
– A domain D is a set of atomic (indivisible) values.
– A domain may include a name, data type, and format, examples:
 USA_phone_number: the set of ten-digit phone number valid
in US.
 GPA: Possible values of computed grade point averages; each
must be a real number between 0 and 4
– Domain of Ai is denoted by dom(Ai)
– Data types: integer, char(20), date, float number, etc.
Eth_telephonenumber:
 The set of ten_digit phone numbers valid in the Ethiopia
– A domain may have a data-type or a format defined for it. The
USA_phone_numbers may have a format: (ddd)-ddd-dddd where each d is a
decimal digit. E.g., Dates have various formats such as monthname, date, year or
yyyy-mm-dd, or dd mm,yyyy etc.
Attribute
20
