Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Relational algebra wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Concurrency control wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
44271: Database Design & Implementation Logical Data Modelling (Avoiding Database Anomalies) Ian Perry Room: C49 Tel Ext.: 7287 E-mail: [email protected] http://itsy.co.uk/ac/0405/sem3/44271_DDI/ What is a Logical Data Model? A ‘robust’ representation of the initial decisions made when building our Conceptual Data Model, which was composed of: Entities Attributes Relationships When I say ‘robust’ I mean that this model MUST ‘perform’ well with respect to a specific style/type of software. Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 2 Database Theories & Software Hardware independent, the match to ‘type’ of software is only concern, e.g.: Hierarchical DBMS Relational DBMS Object-based DBMS Each Database Theory addresses: Ian Perry Data Structure Data Integrity Data Manipulation 44271: Database Design & Implementation: Logical Data Modelling Slide 3 Database Theory = Relational Model First proposed by Dr. E. F. Codd in June 1970. Codd's model is now accepted as the definitive model for relational database management systems (RDBMS). Structured English QUEry Language ("SEQUEL") was developed by IBM Corporation, Inc., to use Codd's model. Codd E F, (1970), A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, Vol. 13, No. 6, Pgs 377 – 387. SEQUEL later became SQL. In 1979, Relational Software, Inc. (now Oracle Corporation) introduced the first commercial implementation of SQL. Ian Perry SQL is the most widely used RDBMS manipulation language. 44271: Database Design & Implementation: Logical Data Modelling Slide 4 Relations look like Entities, but … Entity Staff(SCode, Name, Address, DoB, …) Relation Staff(SCode, Name, Address, DoB, DoE) Entity Contract(CCode, Site, Begin, End, …) May discover requirement for ‘extra’ Attributes, and also need to ‘complete’ our list of Attributes for each Relation. Can’t draw relationship lines, so need to ‘add’ extra attributes to Relations at the ‘M’ end of any ‘1:M’ relationships; e.g. 1 Staff “take part in” M Contract. Relation Ian Perry Contract(CCode, Site, Begin, End, SCode) 44271: Database Design & Implementation: Logical Data Modelling Slide 5 Use Tables to ‘flesh-out’ your Logical Model Staff(SCode, Name, Address, DoB, DoE) SCode 9491 7416 8912 Name Smith Day Jones Address 6 Shaw St 2 Sale St 15 Ayr Av DoB 13/02/65 14/01/57 28/12/76 DoE 03/10/98 22/11/02 01/03/04 Contract (CCode, Site, Begin, End, SCode) CCode 279 665 183 Site Hull York York Begin 27/02/05 14/09/04 04/03/05 End 03/03/05 02/12/04 16/06/05 SCode 9491 7416 9491 NB. Tables ARE NOT Relations! Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 6 Primary & Foreign Keys Most important Attributes in a Relation are know as ‘Keys’: Primary Key: One, or more, Attribute(s) that identify a unique occurrence of the ‘Entity’ that this ‘Relation’ represents. Foreign Key: of which there are two types. Attributes used (i.e. instead of the lines of an ER Diagram) to represent the presence of relationships. Often referred to as: Ian Perry The Primary/Foreign Key Mechanism. 44271: Database Design & Implementation: Logical Data Modelling Slide 7 Attributes, Domains & Relationships Attribute Values should be atomic (i.e. simple/single values only); e.g.: Set of eligible Attribute Values is known as an Attribute’s Domain; e.g.: ‘address’ should be separated into ‘street’ & ‘town’ & ‘postcode’. if we only have 100 members of staff, then the Domain of the ‘SCode’ Attribute could be “whole numbers between 1 & 100”. The Relational Model is weak at explicitly modelling relationships: Ian Perry Attributes in different Relations MUST HAVE same Attribute Domain for relationship to be possible. 44271: Database Design & Implementation: Logical Data Modelling Slide 8 Codd’s Rules Each Tuple (i.e. row) MUST BE unique, i.e.: Therefore: each Relation MUST HAVE a Primary Key. There may be many Candidates for the job of Primary Key, so select on basis of: need a way to discriminate between Tuples. uniqueness AND/OR minimality. Keys with more than one Attribute: Ian Perry are know as composite keys. 44271: Database Design & Implementation: Logical Data Modelling Slide 9 Rules for Integrity No Attribute that is part of the Primary Key can assume a ‘null’ value, else: how could we discriminate between Tuples? Foreign Key Attributes must take values that are either ‘null’, or from same Domain as the Primary Key Attribute to which they are logically linked, else: we will lose the possibility of making relationships. Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 10 Avoiding Database Anomalies Most Database books have a section describing a mathematically-based technique called Normalisation: I will show you a much easier way of achieving the same result. What we want to achieve is a ‘robust’ Logical Data Model; i.e. by: Ian Perry Transforming a Conceptual Data Model into a set of Relations. Checking these Relations for any Anomalies. Documenting them as a Database Schema. 44271: Database Design & Implementation: Logical Data Modelling Slide 11 What is an Anomaly? Anything we try to do with a database that may lead to unexpected and/or unpredictable results. Three types of Anomaly; i.e.: insert delete update Need to check your database design carefully: Ian Perry the only good database is an anomaly free database. 44271: Database Design & Implementation: Logical Data Modelling Slide 12 Insert Anomaly When we want to enter a value into a data cell but the attempt is prevented, as the primary key value is not known. CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45 e.g. We have built a new Room (e.g. B123), but it has not yet been timetabled for any courses (so we don’t have a CoNo value). Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 13 Delete Anomaly When a value we want to delete also means we will delete values we wish to keep. CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45 e.g. CoNo 351 has ended, but Room C320 will be used elsewhere. Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 14 Update Anomaly When we want to change a single data item value, but must update multiple entries CoNo Tutor Room RSize EnLimit 353 Smith A532 45 40 351 Smith C320 100 60 355 Clark H940 400 300 456 Turner H940 400 45 e.g. Room H940 has been improved, it is now of RSize = 500. Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 15 Conceptual Model & Translation Process Conceptual Model: Course M M Staff 1 M Student Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB, ...) Student(Enrol-No, Name, Address, OLevelPoints, ...) Course(CourseCode, Name, Duration, ...) Translation Process: Ian Perry Entities become Relations Attributes become Attributes(?) Key Attribute(s) become Primary Key(s) Relationships are represented by additional Foreign Key Attributes; for those Relations that are at the ‘M’ end of each 1:M Relationship. 44271: Database Design & Implementation: Logical Data Modelling Slide 16 The ‘Staff’ & ‘Student’ Relations Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB, ...) becomes: Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB) Student(Enrol-No, Name, Address, OLevelPoints, ...) becomes: Student(Enrol-No, Name, Address, OLevelPoints, Tutor) NB. Foreign Key Tutor references Staff.Staff-ID Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 17 The ‘Staff’ & ‘Course’ Relations Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB) Course(CourseCode, Name, Duration, ...) becomes: Course(CourseCode, Name, Duration) NB. Can’t ‘simply’ add extra attributes to act as Foreign Keys; as BOTH Relations have a ‘M’ end: I warned you about leaving M:M relationships in your Conceptual Data Model. MUST create an ‘artificial’ linking Relation. Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 18 ‘Staff’, ‘Course’ & ‘Team’ Relations Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB) Course(CourseCode, Name, Duration) Team(Staff-ID, CourseCode) NB. In the ‘artificial’ Team Relation: Primary Key is a ‘composite’ of CourseCode & Staff-ID Foreign Key CourseCode references Course.CourseCode Foreign Key Staff-ID references Staff.Staff-ID Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 19 4 Relations from 3 Entities? Student(Enrol-No, Name, Address, OLevelPoints, Tutor) Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB) Course(CourseCode, Name, Duration) Team(Staff-ID, CourseCode) OK, BUT are they anomaly free? • Is every Tuple unique? • i.e. is there a Primary Key. • Are the Attributes Atomic? • i.e. do they store only ONE item of data. • Does every Attribute within each Relation ‘depend’ upon the Primary Key? Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 20 What if the checks fail? If any Relation fails ‘checks’: we MUST split that Relation into multiple Relations: especially those checking dependency. until they pass the tests. but MUST remember to leave behind a Foreign Key: Ian Perry to ‘point’ forwards to the Primary Key of the ‘new’ split-off Relation. 44271: Database Design & Implementation: Logical Data Modelling Slide 21 Are they Anomaly Free? NOT very Atomic; Could easily be split into ‘Street’, ‘Town’ & ‘PostCode’. Student(Enrol-No, Name, Address, OLevelPoints, Tutor) Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB) Course(CourseCode, Name, Duration) Team(Staff-ID, CourseCode) Ian Perry NOT Dependent upon Staff-ID; Requires a slightly more complex ‘solution’. 44271: Database Design & Implementation: Logical Data Modelling Slide 22 ‘Fixing’ the Dependency ‘Problem’ Staff(Staff-ID, Name, ScalePoint, RateOfPay, DOB) The Attribute ‘RateOfPay’ depends upon ‘ScalePoint’ NOT ‘Staff-ID’. So, we MUST remove ‘RateOfPay’ from the ‘Staff’ Relation, like this: Staff(Staff-ID, Name, ScalePoint, DOB) Pay(ScalePoint, RateOfPay) NB. In the ‘Staff’ Relation: Foreign Key ScalePoint references Pay.ScalePoint Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 23 5 Relations from 3 Entities Student(Enrol-No, Name, Street, Town, PostCode, OLevelPoints, Tutor) Staff(Staff-ID, Name, ScalePoint, DOB) Course(CourseCode, Name, Duration) Team(Staff-ID, CourseCode) Pay(ScalePoint, RateOfPay) Now all we need to do: Ian Perry Is to document our ‘Anomaly Free’ Relations as a Database Schema. 44271: Database Design & Implementation: Logical Data Modelling Slide 24 Document Relations as a Database Schema A Database Schema: We may/should have ‘discovered’ a number of constraints during our analysis of the Business situation, e.g: defines all Relations, lists all Attributes (with their Domains), and identifies all Primary & Foreign Keys. the College only delivers 10 Courses. there are only 12 Points on the Pay Scale. Staff MUST be at least 21 Years Old. These constraints can/should be expressed as the ‘Domains’ of the Database Schema. Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 25 Logical Schema 1 - Domains Schema College Domains Ian Perry StudentIdentifiers = 1 - 9999; StaffIdentifiers = 1001 - 1199; GeneralNames = TextString (15 Characters); Addresses = TextString (20 Characters); PostCodes = TextString (7 or 8 Characters); CourseIdentifiers = 101 - 110; OLevelPoints = 0 - 100; ScalePoints = 1 - 12; StaffBirthDates = Date (dd/mm/yyyy), >21 Years before Today; 44271: Database Design & Implementation: Logical Data Modelling Slide 26 Logical Schema 2 - Relations Relation Student Ian Perry Enrol-No: StudentIdentifiers; Name: GeneralNames; Street: Addresses; Town: Addresses; PostCode: PostCodes; OLevelPoints: OLevelPoints; Tutor: StaffIdentifiers; Primary Key: Enrol-No Foreign Key Tutor references Staff.Staff-ID 44271: Database Design & Implementation: Logical Data Modelling Slide 27 Logical Schema 3 - Relations Relation Staff Staff-ID: StaffIdentifiers; Name: GeneralNames; ScalePoint: ScalePoints; DOB: StaffBirthDates; Primary Key: Staff-ID Foreign Key ScalePoint references Pay.ScalePoint Ian Perry 44271: Database Design & Implementation: Logical Data Modelling Slide 28 Logical Schema ... Relation Course CourseCode: CourseIdentifiers; Name: GeneralNames; … etc. Continue to define each of the Relations in a similar manner. NB. Make sure that you define ALL of the Relations, including: Ian Perry ‘artificial’ ones (e.g. Team) ‘split-off’ ones (e.g. Pay) 44271: Database Design & Implementation: Logical Data Modelling Slide 29 This Week’s Workshop The purpose of this week’s Workshop is to practice developing ‘robust’ logical data models that conform to the ‘rules’ of Codd’s Relational Model. Ian Perry Exploring the ‘definition’ of Relations. Identifying potential anomalies in a Table of data, and ‘solving’ these ‘problems’. Documenting a Database Schema (i.e. a Logical Model), in the format required by Part 2 of the Assignment. 44271: Database Design & Implementation: Logical Data Modelling Slide 30