Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mini-MSDD Relational Databases Thomas P. Sturm University of St. Thomas Outline Data Concepts Relational Model Normalization Logical Data Structures Copyright © 1971-2002 Thomas P. Sturm Relational Databases 2 Data Concepts Why learn about Relational Databases? Data - Information - Database Properties of Data Items Entity, Attribute, Value Descriptors / Identifiers Data Base Design Sample Database CCopyright © 1971-2002 Thomas P. Sturm Data Concepts 3 Why learn about Relational Databases? A way to put end users into direct touch with the information stored in computers. A way to increase the productivity of data processing professionals. Can obtain high-performance implementation of relational models “No surprises” theoretical underpinnings (no “special rules, no “that’s a feature, not a bug”) Universal acceptance from the smallest to the largest databases Readily available design tools A standardize language for doing queries (SQL) Copyright © 1971-2002 Thomas P. Sturm Relational Databases 4 Data - Information - Database INFORMATION: The meaning that a human assigns to data via the known conventions used in their representation. DATA: A formalized representation of facts, concepts, or instructions suitable for communication, interpretation, or processing by human or automatic means. BASE: The bottom of anything, considered as its support or foundation The fundamental part of a thing The chief ingredient of anything, viewed as its fundamental constituent Base in its most general sense equals bottom, but, more specifically, implies a broad bottom by which something is held up or stabilized DATABASE: A collection of stored operational data used by the application systems of some particular enterprise. A stable foundation to support an information process. CCopyright © 1971-2002 Thomas P. Sturm Data Concepts 5 Properties of Data Items There are things about which data is collected entities These entities can optionally have a name or names (both a class/type name and individual/instance names) Entity Type: A category, arbitrarily defined (but agreed to) so that membership within the category can be established, at least at a point in time, e.g. a department Entity Instance: Occurrence of a member in the category in the world, e.g. the payroll department There are certain things that it is desirable to describe about the entities. The various qualities (characteristics) of the entity that are to be described are referred to as attributes For each of these attributes for each entity there is potentially a value (taken from a legal set of values that obey certain constraints or rules) There is some structure in the data or stored values (relationships, associations, dependencies) Most important, the stored data items must have meaning Ref. Thomas P. Sturm, Data and File Structures Copyright © 1971-2002 Thomas P. Sturm Relational Databases 6 Entity, Attribute, Value EXAMPLE: For the entity “the car I drove on my first Sabbatical leave” ATTRIBUTE Manufacturer Model Body Type Model Year Color Owner Class License Number Licensing State VALUE Ford Country Sedan Station Wagon 1973 Blue Thomas P. Sturm Passenger Car NBGO Minnesota Ref. Thomas P. Sturm, Data and File Structures CCopyright © 1971-2002 Thomas P. Sturm Data Concepts 7 Descriptors / Identifiers DESCRIPTOR: A descriptor for an entity is an attribute/value pair. IDENTIFIER: An identifier is an attribute whose value is different for each entity. usually relegated to values necessarily different where necessary, an identifier can be made up of the concatenation of two attributes (which should be thought of as yet another attribute) RETRIEVAL can be based on: identifier (for an identifier, find some descriptors) descriptor (for a descriptor, find some identifiers of entities possessing the descriptor) location (for a particular location, retrieve the data that is stored there) absolute location relative location (This third method of access is not allowable in the relational model) Copyright © 1971-2002 Thomas P. Sturm Relational Databases 8 Data Base Design Impossible to model all of reality Select an appropriate subset of entities, attributes for those entities, values for those attributes Select which interrelationships to preserve Abstract entities and relationships into classes in a way suitable for machine representation human interpretation Organize, code, and structure the stored data Create convenient access paths User | Model | ... | Model | Disk Pack suitable for human interpretation suitable for machine representation - sufficiently abstract to allow minor perturbations - sufficiently powerful to give some understanding about how data in the world are related Ref: Thomas P. Sturm, Data and File Structures CCopyright © 1971-2002 Thomas P. Sturm Data Concepts 9 Sample Database - Employees Overview of Content: The database contains organization, budget, and scheduling information for a software group that is developing an academic information system Entities: Employees - who have a name a job title a manager who, in turn, is an employee a hire date an hourly billing rate (possibly) a dollar annual bonus amount membership in a department which in turn has a name, location, and budget a set of assigned tasks on projects each task by each employee on each project has a time estimate in hours each project has a name, description, budget, and due date Copyright © 1971-2002 Thomas P. Sturm Relational Databases 10 Sample Database - Departments and Projects Entities (continued) Departments - which have a department number a department name a department location (room number) an annual dollar budget employees, who in turn have a name, job description, manager, hire date, hourly rate annual bonus, and a set of assigned tasks (as described above) Projects - which have a project name a project description a project budget a project due date a set of tasks, each of which is to be performed by one or more employees (who in turn have a name, job description, manager, hire date, ...) with a time estimate for each employee for each task CCopyright © 1971-2002 Thomas P. Sturm Data Concepts 11 Sample Database - Tasks Entities (continued) Tasks - each of which have the name of the employee working on the task (who in turn has name, job description, ...) the name of the project that the task is related to (which in turn has name, description, ...) the name of the task being performed the time estimate (in hours) of how long an employee will work on a particular type of task for a particular project Copyright © 1971-2002 Thomas P. Sturm Relational Databases 12 Sample Data (stated in relational form) Employees - (Table name emp) Ename allen barger jones king martin olson pearson radl rogers smith sturm thomas turner vogel Job programmer supervisor programmer clerk programmer analyst programmer supervisor programmer programmer clerk analyst supervisor consultant Mgr barger turner radl barger barger radl radl turner barger barger radl barger turner Hired 09-jun-1991 23-jan-1993 20-feb-1991 22-feb-1991 09-nov-1991 28-apr-1991 01-may-1991 03-dec-1992 08-sep-1992 17-dec-1990 23-sep-1992 03-dec-1992 02-mar-1991 17-nov-1991 Rate 30.00 65.00 35.00 18.00 25.00 55.00 30.00 65.00 25.00 35.00 18.00 50.00 75.00 80.00 Bonus 550.00 0.00 600.00 0.00 1000.00 DeptNo 402 402 401 402 402 401 401 401 402 402 401 402 400 400 Departments (Table name dept) DeptNo 400 401 402 403 Dname programming financial academic support Loc 200 200 100 300 Dbudget 150000.00 275000.00 390000.00 7000.00 Projects (Table name proj) Project_id admit alumni billing budget payroll records Description Admissions Alumni development Student billing Budgeting Payroll Students records CCopyright © 1971-2002 Thomas P. Sturm Pbudget 15000.00 7500.00 11000.00 12500.00 9000.00 6000.00 Due_date 07-apr-1998 30-jan-1999 30-jan-1998 12-mar-1998 15-may-1998 11-feb-1998 Data Concepts 13 Tasks (Table name task) Ename allen allen allen allen barger barger barger barger jones jones jones king king king martin olson olson olson olson pearson pearson pearson radl radl radl radl rogers rogers rogers smith smith smith sturm sturm sturm sturm thomas thomas thomas thomas turner turner Project_id admit admit billing billing admit alumni billing records billing budget payroll admit alumni records admit admit alumni billing records budget budget payroll billing billing budget payroll records records records alumni alumni billing billing budget budget payroll alumni billing budget payroll billing budget Copyright © 1971-2002 Thomas P. Sturm Relational Databases Tname debug implement debug implement manage manage manage manage implement implement debug clerical clerical clerical implement design design design design debug implement implement design manage manage manage debug design implement debug implement implement clerical clerical debug clerical design design design design manage design Hours 25 20 30 20 15 10 8 12 35 70 40 25 9 15 30 75 40 20 45 40 60 80 15 10 15 20 20 30 45 30 90 40 38 20 20 15 5 45 40 70 12 45 14 Relational Model Relational Database model Conceptual Idea of a Relation Translation of Relational Terms Requirements of a Relation Advantages of the Relational Model Differences in the Relational Model Details of Department Relation Org. of Relations in Sample D.B. (Single Table) Relational Operations Relational Algebra Two Table Relational Operations Cartesian Product Joins Relational Database Model “Codd's” Model E. F. (Ted) Codd, CACM V13 #6 (June, 1970), pp. 377-87. “A Relational Model of Data for Large Shared Data Banks” Developed in mid-1970’s Based on the mathematical theory of relations Codd's definition: Given sets S1, S2, ... , Sn (not necessarily distinct), R is a relation on these n sets if it is a set of n-tuples each of which has its first element from S1, its second element from S2, and so on. We shall refer to Sj as the jth domain of R. R is said to have degree n. If R has m n-tuples (or just tuples), R is said to have cardinality m. Copyright © 1971-2002 Thomas P. Sturm Relational Databases 16 Conceptual Idea of a Relation Conceptual (but not physical) ideas: - A relation is a table or a flat file with n columns or fields and m rows or records - Column (or field) j represents a set of values (from a possible set of values, Sj, the “domain”) for a particular attribute of all the entities - Each row (or record represents a set of values for an entity, one for each attribute (column, field) - Degree - number of columns (fields, domains) - Cardinality - number of rows (records, entities, tuples) Copyright © 1971-2002 Thomas P. Sturm Relational Model 17 Translation of Relational Terms Relational Loose Term Equivalent Relation Tuple Degree Cardinality Table Row # of attributes # of table entries Domain field-level edit criteria and integrity constraints Copyright © 1971-2002 Thomas P. Sturm Relational Databases 18 Requirements of a Relation All rows of the relation must have the same attributes in the same order No repeating groups Each row must be unique (No duplicate rows - if there are, they are “cast out”) A set of columns that forms an identifier is the table key Copyright © 1971-2002 Thomas P. Sturm Relational Model 19 Advantages of the Relational Model Logical not physical model - easy to communicate, what not how Data Independence - implementation independent Record interconnections are dynamically generated based on data value - (no user-visible navigation links) Set-at-a-time database operations (relational operators) locate, permute, join, select, project, derive, order, format, present Join - the operator that “connects” tables - is unrestricted - it is not necessary to pre-define access paths Copyright © 1971-2002 Thomas P. Sturm Relational Databases 20 Details of Department Relation attributes (columns) DeptNo Dname 400 entities 401 402 403 Loc Dbudget programming 200 150000 financial 200 275000 academic support 100 390000 300 7000 domain 1 Copyright © 1971-2002 Thomas P. Sturm tuple (row) domain 4 Relational Model 21 Organization of Relations in Sample Database Relation Attributes (Key underlined) (Entity type) emp (Ename, Job, Mgr, Hired, Rate, Bonus, DeptNo) dept (DeptNo, Dname, Loc, Dbudget) task (Ename, Project_id, Tname, Hours) proj (Project_id, Description, Pbudget, Due_date) Copyright © 1971-2002 Thomas P. Sturm Relational Databases 22 (Single Table) Relational Operations named file, view, or relation locate relation boolean entity selection expression selection named attributes projection derivation rules entry-level derivations ordering specification order set-function specification file-level derivations format, edit spec., destination formatting & presentation Copyright © 1971-2002 Thomas P. Sturm Relational Model 23 Relational Algebra Relational operators take one or two relations as their “operands” or arguments Result of applying a relational operator to a relation (or pair of relations) is another relation Consequently, relational operators can be used in sequence to achieve the desired results Copyright © 1971-2002 Thomas P. Sturm Relational Databases 24 Two Table Relational Operations Cartesian Product All rows of the second table appended to all rows of the first table No compatibility requirements Join A form of parallel table lookup Both tables must share a domain Union All rows of the second table appended to the rows of the first table Both tables must have the same domains Set Difference All rows of the first table whose keys do not appear as keys in the second table Both tables must share the same domains for their keys Copyright © 1971-2002 Thomas P. Sturm Relational Model 25 Cartesian Product If R1 and R2 are relations, the Cartesian product is written R1 R2 (in relational algebra) or SELECT * FROM R1, R2; (in SQL) A new relation is generated that consists of every tuple in R1 followed by every tuple in R2 relation empl name able baker codd date age 20 40 60 30 relation group dept 35 45 45 25 dept 35 45 25 loc 100 200 100 Cartesian product empl group empl.name able able able baker baker baker codd codd codd date date date empl.age 20 20 20 40 40 40 60 60 60 30 30 30 empl.dept 35 35 35 45 45 45 45 45 45 25 25 25 Copyright © 1971-2002 Thomas P. Sturm Relational Databases group.dept 35 45 25 35 45 25 35 45 25 35 45 25 group.loc 100 200 100 100 200 100 100 200 100 100 200 100 26 Join Operation Form the Cartesian product between two relations Cast out duplicates (assuming projection is done also) Apply join conditions to select a subset of the Cartesian product (selection) There are a variety of different join types, differentiated by which relations are used what the join conditions are what results are desired Copyright © 1971-2002 Thomas P. Sturm Relational Model 27 Natural Join Operation (Simple join, inner equijoin) - Start with two different tables, form the Cartesian product (e.g. empl x group) empl.name able able able baker baker baker codd codd codd date date date empl.age 20 20 20 40 40 40 60 60 60 30 30 30 empl.dept 35 35 35 45 45 45 45 45 45 25 25 25 group.dept 35 45 25 35 45 25 35 45 25 35 45 25 group.loc 100 200 100 100 200 100 100 200 100 100 200 100 - Select rows where values of a pair of fields are equal (e.g. empl.dept and group.dept) empl.name able baker codd date empl.age 20 40 60 30 empl.dept 35 45 45 25 group.dept 35 45 45 25 group.loc 100 200 200 100 - Project all except the duplicated column empl.name able baker codd date empl.age 20 40 60 30 dept 35 45 45 25 group.loc 100 200 200 100 Copyright © 1971-2002 Thomas P. Sturm Relational Databases 28 Expressing the Natural Join The natural join is written: empl x group where empl.dept = group.dept in the relational algebra The natural join is written: SELECT * FROM EMPL, GROUP WHERE EMPL.DEPT = GROUP.DEPT; in SQL The natural join performs a “table lookup” function by “looking up” data from the second table for a field in the first table Unfortunately, if no match is found for an item “looked up” in the first table, that row in the first table is “lost” Copyright © 1971-2002 Thomas P. Sturm Relational Model 29 Normalization Normalization Tools Attributes for a Relational model Full Functional Dependence Full Functional Dependence Examples Normal Form Overview Universe of Relations First vs. Second Normal Form Second vs. Third Normal Form Third vs. Boyce-Codd Normal Form Fourth Normal Form Converting to 4NF Fifth Normal Form Converting to 5NF Domain/Key Normal Form Enforcing Domain Integrity in DK/NF Normalization Tools Decomposition each document, report, data-flow, etc. is defined to be a relation any relations that violate the required normal form are divided into 2 or more relations that satisfy the normal form the resulting relational structure has a relation for each entity Construction this section next section identify entities - objects that have attributes, identifiers, and relationships assign attributes to the right entities attributes apply to all entity-instances attributes are fully functionally dependent on the whole identifier form “roles” and “intersection entities” where necessary Copyright © 1971-2002 Thomas P. Sturm Relational Databases 32 Attributes for a Relational model Each entity instance has exactly one value for each attribute (within the scope of the data model) atomic repeating groups are not allowed vectors are not allowed pointers and other abstract references are not allowed values for a particular attribute come from a specified pool of values (called its domain) An attribute (or a specific set of attributes) forms an identifier for each entity instance if the entity instances are different, so is the value (or set of values) for the attribute (or set of attributes) an identifier (key) must be found and cannot have a null value there may be more than one, especially since a set of attributes can be an identifier must be minimal (cannot discard any attribute without losing uniqueness) Copyright © 1971-2002 Thomas P. Sturm Normalization 33 Full Functional Dependence If each value of an attribute has associated with it precisely one value for a second attribute, then that second attribute is functionally dependent on the first Example: In the emp relation: Ename is an identifier, and we choose it as the primary key other attributes of the employee will then generally be functionally dependent on Ename so Job is functionally dependent on Ename (or Ename functionally determines Job) All attributes in a relation must necessarily be functionally dependent on the primary key Have functional dependency if agreement on the first value necessarily implies agreement on the dependent value. But remember that the primary key can be a set of fields. Full functional dependence implies that there is no subset of the set of fields that has functional dependence. Copyright © 1971-2002 Thomas P. Sturm Relational Databases 34 Full Functional Dependence Examples Example A: Job is functionally dependent on (Ename, Mgr) BUT, it is also true that Job is functionally dependent on (Ename) SO Job is not fully functionally dependent on (Ename, Mgr) Example B: Hours is functionally dependent on (Ename, Project_id, Tname) AND Hours is not functionally dependent on (Ename), (Project_id), (Tname), (Ename, Project_id), (Ename, Tname), (Project_id, Tname) SO Hours is fully functionally dependent on (Ename, Project_id, Tname) Copyright © 1971-2002 Thomas P. Sturm Normalization 35 Normal Form Overview Universe of All Data Relations (normalized / unnormalized 1st Normal Form 2nd Normal Form 3rd Normal Form Boyce-Codd Normal Form (BCNF) 4th Normal Form 5th Normal Form (PJ/NF) Domain/Key Normal Form (DK/NF) Copyright © 1971-2002 Thomas P. Sturm Relational Databases 36 Universe of Relations Any flat file is a relation (0th normal form), but not necessarily “well formed” Normalization provides a set of criteria to evaluate the “well formedness” of a relation (but only one criteria for determining a “good” form) In general, a flat file may have repeating groups Example 1 - suppliers: part diode bulb suppliers (GE, TRW, Mot) (GE, Syl) Implemented as ? part diode bulb supplier1 GE GE supplier2 TRW Syl supplier3 Mot Eliminate repeating groups by repeating the key to obtain 1st normal form Example 1 - suppliers: part diode diode diode bulb bulb supplier GE TRW Mot GE Syl Copyright © 1971-2002 Thomas P. Sturm Normalization 37 First vs. Second Normal Form Example 2 - inventory: part # 100 100 200 200 300 warehouse # 05 08 05 10 08 wh_address Mpls StPaul Mpls Madison StPaul quantity 200 300 250 400 350 Problems occur because this table is not focused on one primary key - it is “about” two things - warehouses and parts in warehouses. Eliminate the multiple focus of a composite key by breaking into 2 relations using projection to obtain 2nd normal form One table about warehouses: One table about inventory with a composite key: warehouse# 05 08 10 part# 100 100 200 200 300 wh_address Mpls StPaul Madison warehouse# 05 08 05 10 08 Copyright © 1971-2002 Thomas P. Sturm Relational Databases quantity 200 300 250 400 350 38 Second vs. Third Normal Form Example 3 - departments: name smith jones king turner olson Problem: dept 402 401 402 400 401 dept_loc 100 200 100 200 200 Functional dependency is transitive The primary key is name dept is functionally dependent on name dept_loc is also functionally dependent on name, but it is transitive because dept functionally determines dept_loc Eliminate the transitive dependence by breaking into 2 relations using projection to obtain 3rd normal form name smith jones king turner olson dept 402 401 402 400 401 Copyright © 1971-2002 Thomas P. Sturm and dept 400 401 402 dept_loc 200 200 100 Normalization 39 Third vs. Boyce-Codd Normal Form Example 5 - stock: s# 10 10 10 20 20 30 sname GE GE GE TRW TRW Syl p# 102 103 104 102 105 103 qty 1000 625 2000 500 1200 1300 technically in 3NF qty is the only non-key attribute (like example 1) candidate keys are (s#, p#) and (sname, p#) didn't require components of an alternate key to be fully functionally dependent on the primary key Eliminate the multiple focus by breaking into 2 relations using projection to obtain Boyce-Codd normal form s# 10 20 30 sname GE TRW Syl and s# 10 10 10 20 20 30 p# 102 103 104 102 105 103 qty 1000 625 2000 500 1200 1300 or [s#, sname] and [sname, p#, qty] Copyright © 1971-2002 Thomas P. Sturm Relational Databases 40 Fourth Normal Form 4NF and 5NF are relevant only when all attributes in the relation are parts of the key if in BCNF and have a non-key attribute, also in 5NF Example 7 - skills: Suppose we wish to store employee job skills and language skills. (An employee may have many of each.) employee skill language Jones electrical French Jones electrical German Jones mechanical French Jones mechanical German In general: if and Jones Jones x y A B then and Jones Jones x y B A The relation is in BCNF - because it is all key ... but there is redundancy Copyright © 1971-2002 Thomas P. Sturm Normalization 41 Converting to 4NF Ask the following questions: Could the relation have non-key attributes? Could any combination be missing? If both answers are NO, need to break up relation to achieve 4NF Example 7 - skills: employee skill language should be broken up into two relations: employee Jones Jones skill electrical mechanical and employee Jones Jones language French German if job skill and language are independent Copyright © 1971-2002 Thomas P. Sturm Relational Databases 42 Fifth Normal Form PJ/NF or Projection-Join Normal Form Deals with cases where information can be reconstructed from smaller pieces of information which can be maintained with less redundancy Example 8 - dealerships: 1. Agents represent Companies 2. Companies make Products 3. Agents sell Products Which Agent sells which Product for which Company? Agent smith smith jones Company ford gm ford Product car truck car this form is necessary in the general case BUT if we put a rule into effect that reads: 4. if an agent sells a product, and an agent represents a company, then the agent must sell the product made by the company So, to obey the rule, we must add smith smith ford gm truck car NOW, with the rule and the new rows, we have REDUNDANCY Copyright © 1971-2002 Thomas P. Sturm Normalization 43 Converting to 5NF This time, we must break the relation into three parts (will not break in two) Example 8 - dealerships: Agent smith smith jones smith smith Company ford gm ford ford gm Product car truck car truck car BREAK INTO 3 Agent smith smith jones Company ford gm ford Agent smith smith jones Product car truck car Company ford ford gm gm Product car truck car truck A relation is already in 5NF if it's information content cannot be reconstructed from several smaller record types (having different keys) Only have 5NF problems if there are symmetry constraints (a pair of rows requires the existence of one or more additional rows) Copyright © 1971-2002 Thomas P. Sturm Relational Databases 44 Domain/Key Normal Form No insertion/deletion anomalies Impossible to make an insertion/deletion that violates a constraint Constraint types: domain constraints key constraints Example 9 - customers branch west south east south cust# 1234 1325 1421 1511 where valid branches are west, east, south Copyright © 1971-2002 Thomas P. Sturm Normalization 45 Enforcing Domain Integrity in DK/NF Example 9 - customers: branch west south east south north cust# 1234 1325 1421 1511 1600 If this update is possible, not in DK/NF One possibility for prohibiting this update is to maintain a table of legal branches and write code to prohibit the entry of a branch not in the table legal branch west south east Problem: What's to stop someone from placing south in the legal branch table? Possible partial solution: Restrict access to the legal branch table Copyright © 1971-2002 Thomas P. Sturm Relational Databases 46 Logical Data Structures Logical Database Design Logical Data Structures (LDS) Basic LDS Components Example Relationships Handling an M-M Relationship Identifier Representation Sample Database LDS for Sample Database LDS for Example 7 - Skills Correct LDS for Independence LDS for Example 8 - Dealerships Dealerships with Constraints Modelling Concepts Map LDS to Well-Formed Relations Logical Database Design Constructive approach Considers semantics Documents data dependencies identifiers entities needed relations “rules” Copyright © 1971-2002 Thomas P. Sturm Relational Databases 48 Logical Data Structures (LDS) Graphical means of naming and depicting the types of data in a database Simple, yet precise Useful to technically-oriented analysts application-oriented users Easy to read Supports the design task logical structure design is hard tool aids the design task notation does not get in the way Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 49 Basic LDS Components Entity any type of thing about which information is maintained EXAMPLE entity_name student Attribute a characteristic of exactly one entity (fully functionally dependent on the entity) attribute_name EXAMPLE: Student attributes student_name student_id# student soc_sec# Relationships an association between a pair of entities (or “roles”), one-to-one, one-to-many only or Copyright © 1971-2002 Thomas P. Sturm Relational Databases but never 50 Example Relationships 1 - 1 Example: Monogamous marriage man woman Can label relationship man wife of man/ woman husband of woman 1-M Example: Students of a college college student Need not label a relationship if it can be stated as: college of student / students of college or student has college / college has students Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 51 Handling an M-M Relationship M-M Example: Brother - Sister man_name woman_name man sisters of man/ brothers of woman woman Problem: how do you represent the presence of sibling rivalry? THIS WON'T WORK man_name woman_name man woman rivalry SOLUTION man_name woman_name man woman brother-sister rivalry Copyright © 1971-2002 Thomas P. Sturm Relational Databases 52 Identifier Representation Identifier: a set of attributes or relationships that uniquely identify an instance of an entity (single field key) (multiple-field key) Example: student_name college_name college student_id# student college# soc_sec# Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 53 Sample Database Employee: (emp) attributes: Ename, Job, Mgr, Hired, Rate, Bonus Department: (dept) attributes: DeptNo, Dname, Loc, Dbudget Task: (task) attributes: Tname, Hours Project: (proj) attributes: Project_id, Description, Pbudget, Due_date Relationships employees are members of a department employees have a manager who is an employee employees are assigned to tasks on projects Copyright © 1971-2002 Thomas P. Sturm Relational Databases 54 LDS for Sample Database DeptNo Dname Loc dept Hired Ename Dbudget Rate Job emp Mgr Bonus Tname task Project_id Hours Description proj Pbudget Due_date Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 55 LDS for Example 7 - Skills Employees can have many skills, and a skill can be had by many employees; an employee can know many languages and a language can be known by many employees. skill employee emp job_skill emp/lang/job_skill This diagram is correct if all 3 are interdependent language lang skill employee job_skill emp language lang This diagram is almost never correct (It implies that a skill can be held by only one employee) Copyright © 1971-2002 Thomas P. Sturm Relational Databases 56 Correct LDS for Independence Assuming job skills and language skills are independent, they represent two separate many-to-many relationships emp/job_skill skill employee emp job_skill emp-lang language lang Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 57 LDS for Example 8 - Dealerships In the general case, a contract involves one dealer, one manufacturer, and one product. A dealer can have many contracts, a manufacturer can have many contracts, and a product can be mentioned in many contracts. company agent dealership manufacturer contract This diagram is correct in the general case product vehicle Copyright © 1971-2002 Thomas P. Sturm Relational Databases 58 Dealerships with Constraints Dealers can deal with many manufacturers, and manufacturers with many dealers. Dealers can sell many vehicle types and vehicle types can be sold by many dealers. Manufacturers can make many vehicles and vehicles can be made by many manufacturers. The combinations of who sells what is determined by symmetry. dealer-mfgr company agent dealer manufacturer dealer-vehicle mfgr-vehicle vehicle product Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 59 Modelling Concepts Entities: “it” must have identifier attributes relationships “it” must be the focus of the system need to develop for “it”: name description membership criteria must examine roles within subsets of “it” Attributes: must be non-transitively fully functionally dependent on the entity it describes must develop for each attribute: name description domain definition Copyright © 1971-2002 Thomas P. Sturm Relational Databases 60 Modelling Concepts (Continued) Identifiers: determine which attributes are part of it verify uniqueness establish “not null” requirements Relationships: establish degree 1-1 or 1-M entity on 1 side must be functionally dependent on entity on M side develop: - name - definition incorporate constraints, rules note referential integrity - (values of foreign key must exist in key field of another relation) - (e.g. in the emp relation, if an employee is listed as being in department 402, then in the dept relation there must contain a row with a key value of 402) Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 61 Map LDS to Well-Formed Relations LDS Relational Model entity attribute descriptor single-valued relationship descriptor multi-valued relationship descriptor 1-1 relationship relation name attribute attribute (foreign key) 1-M relationship nothing either or both relationship descriptors are attributes relationship descriptor with degree 1 (on the M side) is an attribute Copyright © 1971-2002 Thomas P. Sturm Relational Databases 62 LDS Relations Examples Example: College students student_name college_name college student_id# student college# soc_sec# college (college#, college_name) student F.K. (student#, college#, student_name, soc_sec#) Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 63 Sample Database Relations (See page 55 for the Sample Database LDS) dept (DeptNo, Dname, Loc, Dbudget) emp F.K. in emp F.K. (Ename, Job, Mgr, Hired, Rate, Bonus, DeptNo) proj (Project_id, Description, Pbudget, Due_date) task F.K. F.K. (Tname, Ename, Project_id, Hours Copyright © 1971-2002 Thomas P. Sturm Relational Databases 64 Relations for Example 7 - Skills emp/job_skill skill employee emp job_skill emp-lang language lang emp (employee) job_skill (skill) lang (language) emp/job_skill F.K. F.K. (employee, skill) emp-lang F.K. F.K. (employee, language) Copyright © 1971-2002 Thomas P. Sturm Logical Data Structures 65 Relations for Example 8 - Dealerships (See page 59 for Dealership LDS with symmetry restrictions) dealer (agent) manufacturer (company) vehicle (product) dealer-mfgr F.K. F.K. (agent, company) dealer-vehicle F.K. F.K. (agent, product) mfgr-vehicle F.K. F.K. (company, product) Copyright © 1971-2002 Thomas P. Sturm Relational Databases 66 References Carlos, John V. “Logical Data Structures.” Technical Report 85-23. Computer Sciences Department, Institute of Technology, University of Minnesota. 1986. Codd, E. F. “Is Your DBMS Really Relational.” Computerworld. October 14, 1985. Codd, E. F. “Relational Database: A Practical Foundation for Productivity.” Communications of the ACM. Vol. 25, Number 2 (February, 1982). Conte, Paul. “Understanding Relational Data Bases.” Computer Language. Vol. 4, Number 5 (May, 1987). Date, C. .J. and Darwen, Hugh A Guide to the SQL Standard. Third Edition Addison-Wesley. 1993. Date, C. J. An Introduction to Database Systems. Volume I, Sixth Edition. Addison-Wesley. 1995. Date, C. J. An Introduction to Database Systems. Volume II. Addison-Wesley. 1984. Date, C. J. Relational Database: Selected Writings. Addison-Wesley. 1986. Date, C. J. “Where SQL Falls Short.” Datamation. May 1, 1987. Harrington, Jan. Relational Database Management for Microcomputers. Holt, Rinehart, and Winston. 1987. Kent, William. “A Simple Guide to Five Normal Forms in Relational Database Theory.” Communications of the ACM. Vol. 26, Number 2 (February, 1983). Markel, John. “Is ANSI-Standard SQL An Application Development Cure-all?” Hardcopy. May, 1987. Martin, James. Fourth-Generation Languages. Volume I: Principles. Prentice Hall. 1985. Nolan, Richard L. “Managing the Computer Resource: A Storage Hypothesis.” Communications of the ACM. Vol. 16, Number 7 (July, 1973). Rob, Peter and Coronel, Carlos. Database Systems: Design, Implementation, and Management. Third Edition. Boyd and Fraser. 1997. Sturm, Thomas P. Data Structures, Direct Access, and Database Management. Copyright © 1971-2002 Thomas P. Sturm Relational Databases 68