Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft Jet Database Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Ingres (database) wikipedia , lookup
Functional Database Model wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Relational algebra wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
1 Duty 5 Designing a Relational Database The Basics of Relational Database The relational system is most widely used system I PC. The relational database model has become the de-facto standard for the design of databases both large and small. The relational database was developed by mathematician named Dr.E.F Codd in the early 1970’s. He used the concept of a relation in mathematics as the basis of a structured method of storing and retrieving data. The background on which a relational database was formed is as follows: “A relation is a two-dimensional table, consisting of a collection of rows and columns. A series of these tables make up a database. A table can be linked to a file in a conventional file based system, and each column is like a field. A row then can be compared to a record in the file.” In short, the relational database systems are database systems that use the relational data model. In the following section the further detail of relational model and its building blocks will be discussed. The relational system is a system in which the data perceived by the user as table and the user uses certain operators 2 to work on the data. The relational model is divided in to three parts having to do with objects, integrity and operators. Relational Objects The relational data objects are domains and relations. Domains In this model, the smallest semantic units of data are called scalars cannot be decomposed (break part) further without loosing meaning. For example, a sex field may have values F and M. But, the address column may be made to hold Woreda, Kebele and House- number information. In this case, the address date cannot be called a scalar as it can be further decomposed to three data units each of which can represent a separate entity. A domain is a pool of values from which specific attributes of specific relations draw their actual values. A domain also can be defined as a set (The same type) of scalar values. For example, Domain of cities (Addis Ababa, Jimma, Diredawa, Lekempt, Assosa), Domain of courses (English, Amaharic, Database management, Geography) The domain concept is not supported by many RDBMS (Relational Database Management System). Instead a built in data types used. Domains can be used for refusing invalid entries and facilitating comparisons. The data definition for domains takes the form :- Create domain name definition domain To change an existing domain definition we can use: Alert domain Dname definition Domain can be removed from the system using: 3 Destroy domain name Relations A relation is made up of attributes (columns) and tuples (rows). The term relation, tuples and attribute are used as a substitute for the more formal terms table, row, and column or table, record and field. In the discussion to follow for the sake of simplicity, we will avoid the relational terms and in place of them use their equivalents. A table is composed of a predefined number of columns and changeable number of rows. Each column in a table draws its values from a domain defined earlier. The table can be view as a mathematical set of the table header and the table body parts. The table header consists of <column names: their respected domain names>. Whereas the table body contains a set of pairs (<column name: value>). For example, if we have the following table Name Age Sex Abebe 23 M Kebede 45 M Zahra 26 F Ahmed 53 M The table definition requires a definition of each column and its associated domain. The domain for this table may be defined as: Create domain dname character (20) Create domain dage numeric(2) Create domain dsex character(1) And the table header definition will look loike : {(name , dname), (age, dage), (sex,, dsex) } the body part of the table , will look like 4 {(<name:’abebe’ >,<age:23>,<sex:’M’>),(<name:’kebede’>,<age,45>>,<sex:’M’>),(<name:’z ahra’>,<age:26>,<sex:’f’>,(,name:’ahmed’)>,<age:53,<sex:’m’) } the data definition for relations takes the form; create base relation_name (column_definition_commalist candidate-key- definition-list 5 Foreign-key-defnition_list Properties of the relations Relation or tables posses certain important features and these are: 1. There are no duplicate rows (topples) allowed in a table; in other words, there must not be two identical rows in a table. This is very important property of the relational model, for if duplicate rows are allowed in a table, then there would be no way for a program to uniquely reference a certain row in a table. Thus creating an inherent problem for programming. 2. All value in a column are atomic or consist of scalar values or never a collection of several values. 3. Rows in a table are not ordered, i.e,. Within a table there is no inherent ordering of rows (top to bottom). 4. Columns in a table are not ordered i.e,. within a table there is no inherent ordering of columns (left to right). The last two properties mean that the operations in tables in the rational model should not depend on a specific ordering of columns or rows. In short, all rows and fields are equal in the sense that none of them has to exist in the context of others and that none is higher or lower than others in the overall data structure. Kinds of Relations (tables) The following types of tables can be defined in a relational system. Base relations/tables These tables are created by some data definition language command. Data in base tables do not come from any source internal to a database; instead, data must be 6 entered into base tables manually or through some batch data transfer process. Data in these tables are stored “permanently” in relation to the database itself. Views A view is a named-derived table (relation) that is represented within the system purely by its definition in terms of other base tables or views. Views can be treated just like real tables. Snapshots A snapshot is a named-derived like a view but unlike a view it stores its own data rather than the definition. Snapshots also Can be treated as a base table. The snapshot can be taken as saved form of a result of some query that produces a table. Query result This is the final output table resulting from the specified query. It may or may not be saved or have persistent existence. Temporary tables Are tables, usually, created by the DBMS and destroyed by it at some appropriate time. This can include intermediate tables that are created when some large operation is underway and removed when the operation is finalized. The Catalog 7 The catalog or data dictionary contains detail information regarding the various objects in the relational database. These are the tables. Indexes, user information. Data integrity and security rules. And so on. The catalog itself is made up of tables that can be manipulated as any other table in the system. In most cases they might be kept hidden by the DBMS with the possibility of Manu plating or working on them. Relational Data Integrity Integrity rules are certain rules or checking mechanisms. When applied, guide the system to prohibit the entry of invalid or an acceptable (to the case in consideration) data or operations that would result in such types of data. The integrity rules may be defined at the different level: column (filed) row (record) level or table level or at the database level. For example. Some of the rules needed of supplier-parts database system are shown below, Supplier id numbers must be of the form, snnnn. Part numbers must be of the form, pnnnn. Part colors must be red, Green or white only. Part weigh must be greater than zero. The database definition needs to be extended to include certain rules, the purpose of which is to inform the DBMS of certain constrains of the real world (such as the constrain that part weight can not be negative). So that it can prevent such undesired value from entering occurring in the system. The DBMS may need to monitor all INSERT and UPDATE operation and reject any operation that attempts to enter the valid entries (a negative weight). The relational model provides two general integrity features that can applicable to any relation database system. These are a) Candidate Keys and b) Foreign Keys. 8 Candidate Keys Tables can contain multiple rows of data. And each row of a table in a relational system must be uniquely identified by some column or combination in that table. All columns (or combinations of columns) in a table with unique values are referred to as Candidate keys. Among the candidate keys found in a table one can be selected as a primary key of that table and all other candidate keys rather than the primary key are referred to as alternate keys). Keys can be simple or composite. A simple key is made up of one column. Whereas a composites key is made up of to or more columns. If one or more columns of a composite key satisfy the definition of a candidate key then that composite key will not be considered as a candidate key. In most DBMS. Indexes are used to implement candidate keys. Hence the unique indexes found in such systems are net similar to candidate keys. Note that a system that does not have a candidate key can display strange behaviors in some circumstances. The Entity Integrity Rule Specifies that no component of a candidate key or the primary key is allowed to have nulls need to have some values. Choosing among the Candidate keys Since there may be multiple candidate keys in a table you must mane a decision as to which candidate key is to be the primary key. There is no general rule that can be applied here you have to use your own judgment in many cases. Some rules of thumb are: Choose the column(s) least likely to change. 9 Choose as few columns as possible. Choose columns that are familiar to users, if possible. Foreign keys The power of a relational database system lies in the fact that rows (or records) in one table can be matched to records in other tables through the use of keys: therefore. Primary keys would be largely useless if not used for cross-referencing between tables. Primary keys are reference through foreign keys. A foreign key is a column in a table used to reference a primary key in another table. Take, as an example, the following tables that hold data about a company’s employees and all departments in the company. TABLE employees Emp -Id Emp_name Dep_id Salary A1 Abebe D1 456.90 A2 Almaz D2 600.00 B1 Belay D1 677.00 A3 Ahmed D2 600.00 G1 Genet D3 500.00 Departments Dep_id Dep_name Budget D1 Administration 33456.90 D2 Planning 33600.00 D3 Sales 33677.00 D4 Purchase 66600.00 D5 Construction 56600.00 10 Note that the Dep-id column appears in both the employees and department’s tables. In the departments table. The Dep-id is primary key; in the employees table. However this field is used as a foreign key. You must make sure that both foreign keys and their corresponding primary keys share a common meaning and draw their values from the same domain. Any column including, the primary key, can be a foreign key and can, also, be simple or composite. In the above example, the department’s table is referred to as the referencing and the employee’s table is referred to as the referenced/target table. The Referential Integrity Rule A long with the foreign key concept. The relational model includes the referential integrity rule. The rule says the database must not contain any unmatched foreign key values. The term ‘unmatched foreign key value’ here means a foreign key value for which there does not exist a marching value of the relevant primary key in the target table. The cases to be considered are: 1. What should happen on attempt to delete the target the target of a foreign key reference? For example. An attempt to delete a department for which there exists at least one employee working. In general, there are two possibilities: Restricted the diskette operation is restricted to the case where there are no such match into records (reject otherwise). Cascades the delete operation cascades to deleting all records with matching values in the referencing table (employee) 2. What should happen on attempt to update a primary key that is the target of a foreign key reference? For example. An attempt to update a department id for which their exists at least one employee. possibilities: In general, there are two 11 Restricted the update operation is restricted to the case where there are no such matching records (reject otherwise). Cascades the update operation cascades to update the foreign key in those matching records in the referencing table (employees). For each foreign key in the design, the database designer should specify, not only the columns that constitute the foreign key and the tables, but also the foreign key rules to apply the situations occur. 12 Introductory Notes on Set Theory and Relation 13 14 Mathematical relations: – Cartesian product: Set1 x Set2 Set1 contains 1,2,3 Set2 contains 6,7,8 The Cartesian product contains: (1,6),(1,7),(1,8),(2,6),(2,7),(2,8),(3,6),(3,7),(3,8) – A relation is any subset of this Cartesian product Example: If customer_name = {Jones, Smith, Curry, Lindsay} customer_street = {Main, North, Park} customer_city = {Harrison, Rye, Pittsfield} Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield) } is a relation over customer_name x customer_street x customer_city G 15 Let S1 = {0,1} Let S2 = {a,b,c} Let R S1 X S2 Then for example: r(R) = {<0,a> , <0,b> , <1,c> } is one possible “state” or “population” or “extension” r of the relation R, defined over domains S1 and S2. It has three tuples. 16 Data Models Relation Relation – – – – A relation may be thought of as a set of rows. A relation may alternately be though of as a set of columns. Each row represents a fact that corresponds to a real-world entity or relationship. Each row has a value of an item or set of items that uniquely identifies that row in the table. 17 – Sometimes row-ids or sequential numbers are assigned to identify the rows in the table. – Each column typically is called by its column name or column header or attribute name. Relation properties: – A distinct name for each relation in the same relational schema – 1 or more attributes, each with: a distinct name within the same relation all values from the same domain no significant order within a relation a domain constraint Attributes are unordered (Left to right) All attribute values are atomic – 0..N tuples, with each tuple: not duplicated containing no more than 1 atomic value per cell (row-column intersection) – i.e., 1st Normal Form in no significant order within the relation • The only structure available is a 2-dimensional file of data. • This is known as a relation or table. • Each entity corresponds to a table and each attribute to a column (or field) in that table. • • • • • • • Each entity occurrence corresponds to a row of the table Data is held in tables There is no order of data in the tables - either in row or attribute Primary Key - Foreign Key relationship Data Typing including NULLS Query Access - insert, update, delete, retrieval Indexing on Candidate (and Primary) keys • Generally Relation: made up of 2 parts: 18 • • Instance : a table, with rows and columns. #Rows = cardinality, #fields = degree / arity. Relation Schema R • Used to describe a relation • R(A1,...,An) • R: relation name • A1,...,An: a list of attributes • Degree of relation, n: number of attributes • i.e.) CAR(SerialNo, Make, Model, Year) Relation State r(R) (= r) • Relation Extension: current status of a relation R as a set of n tuples • r(R) = t1,...,tm • n-tuple t = v1,...,vn: ordered list of n values Database Schema – Set of relational schemas – E.g. Product(Name, Price, Category, Manufacturer), Vendor(Name, Address, Phone), ....... Relational Database State of S • defined as a set of relation states at a particular point in time • DB = {r1, ..., rn}, where ri is a state of Ri • ri relation state satisfy the integrity constraints 19 Relation Model has three part:the data structures; the integrity constraints; the data manipulation operators. The data structures; Data Structures - domain, attribute, relation, row (tuple), primary key, degree, cardinality. Integrity Constraints - entity integrity and referential integrity. Data Manipulation Operations - defined through relational algebra and equivalent relational calculus. – Domain – Domain of an Attribute – A domain D is a set of atomic (indivisible) values. – A domain may include a name, data type, and format, examples: USA_phone_number: the set of ten-digit phone number valid in US. GPA: Possible values of computed grade point averages; each must be a real number between 0 and 4 – Domain of Ai is denoted by dom(Ai) – Data types: integer, char(20), date, float number, etc. Eth_telephonenumber: The set of ten_digit phone numbers valid in the Ethiopia – A domain may have a data-type or a format defined for it. The USA_phone_numbers may have a format: (ddd)-ddd-dddd where each d is a decimal digit. E.g., Dates have various formats such as monthname, date, year or yyyy-mm-dd, or dd mm,yyyy etc. Attribute 20