* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Database Introduction -2
Oracle Database wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Ingres (database) wikipedia , lookup
Concurrency control wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Relational algebra wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
ContactPoint wikipedia , lookup
And Franchise Colleges HSQ - DATABASES & SQL Supplement 01(b) Database-Introduction-2 By MANSHA NAWAZ Supplement 01 (b) Database Introduction-2 The Database Concept (4GL TOOLS & TECHNIQUES) • A Database A data base is a collection on non-redundant data shareable between different application systems. [Howe, D.R. 1989] – This does not imply a computer system. – Is a well organised filing cabinet a database by this definition? • A Database Management System (DBMS) – A sophisticated software development package capable of handling a systems database stored needs. • We are particularly interested in Relational Database Management Systems in this module (RDBMS). Supplement 01 (b) Database Introduction-2 Application Systems Sharing Data Admission System Operation Scheduling Report Medical Staff Report Database • The Applications (Application Systems / Programs etc. – The Admission System uses patient and medical staff data. – The Operation Scheduling Report uses operating theatre, patient and medical staff data. – The Medical Staff Report uses medical staff data. • The application systems share data. Supplement 01 (b) Database Introduction-2 Data Models • A Data Model provides a particular way of thinking about data, at least in terms of its structure Data Models include data descriptions, data relationships, data semantics, consistency constraints [Silberschatz et. al.] Data model comprises three components: data structures, data manipulators, and general integrity rules [Codd (1970)] • There are many types of data model. – Hierarchic – Entity-Relationship • - Network - Object-Oriented - Relational - Multimedia - Multidimensional - etc. Each database uses a definition language – imposes restrictions on • what can be defined • how entities relate to each other • In this module we are interested mainly in the Relational and Entity-Relationship data models. • Why? The principles involved in Entity Relationship Modelling apply to all data models Supplement 01 (b) Database Introduction-2 The Relational Model Relational model first proposed in 1970 by Dr E F (Ted) Codd in the paper ‘A relational model of data for large shared data banks’. – Achieve program/data independence by treat data in a disciplined way – Has a mathematical basis – the term “relation” comes from this • Apply rigour of mathematics • Use set theory • Determining data structure. – Data is stored in a structure of relations (tables) defined by a data definition language (DDL). – The elements of data structure used in relational models are relations (tables), attributes, tuples (rows), and domains. [Rolland p39-44] • Defining data integrity. – Data integrity means that data remains stable, secure, and accurate. – It is maintained by internal constraints known as integrity rules that are invisible to users. [Rolland p45-48] Supplement 01 (b) Database Introduction-2 The Relational Database • • A relational database is made up of relations (tables) in which data are stored. A relation (table) is a 2-dimensional structure made up of attributes (columns) and tuples (rows). Relation • A relation is a table that obeys the following rules: – – – – There are no duplicate rows in the table. The order of the rows is immaterial. The order of the columns is immaterial. Each attribute value is atomic, ie each cell can contain one and only one data value. Supplement 01 (b) Database Introduction-2 Example of a Table (Relation) • • Relations can be manipulated and changed using a data manipulation language (DML) that employs relational operators. – These operators are based on the concepts of relational algebra. Information is represented as two dimensional tables as below. ANIMAL TABLE ANIMAL Supplement 01 (b) ANAME AFAMILY Candice Camel 1800 Zona Zebra 900 Sam Snake 5 Elmer Elephant 5000 Leonard Lion 1200 Database Introduction-2 WEIGHT Tables and Keys – for relationships • • • • A primary key is a unique identifier for each row in a table. Can consist of one or more columns. Each table contains data about one entity. A foreign key is a column or columns in one table which reference(s) a primary key column or columns in another table. Values in a foreign key must match an existing value in the primary key or be NULL. This is known as the referential integrity rule. ANO in the ANIMAL-FOOD table is part of the primary key and also a foreign key. ANIMAL-FOOD ANO FOOD ANIMAL CA1 Hay ANO ANAME AFAMILY WEIGHT CA1 Buns CA1 Candice Camel 1800 ZE4 Brush ZE4 Zona Zebra 900 SN1 Mice SN1 Sam Snake 5 SN1 People EL3 Elmer Elephant 5000 EL3 Leaves LI2 Leonard Lion 1200 LI2 People LI2 Meat Supplement 01 (b) Database Introduction-2 Relational Database Terminology • • • • • • • Relation Tuple Attribute Primary key Domain Degree Cardinality Supplement 01 (b) a table with rows and columns a row of a relation a named column of a relation a unique identifier for each row in a relation the set of allowable values for a column the number of columns in a relation the number of rows in a relation Database Introduction-2 Flight Table Example Flight: Flight# BA143 BA142 KT222 KT401 KT402 KT111 KT112 DE477 DE478 BA101 BA102 Origin NAP ROM LHR JFK DUL CCG MIA ATH CDG EDI LHR Destination ROM NAP JFK DUL JFK MIA CCG CDG ATH LHR EDI Arrival 10.15 10.22 10.34 10.45 10.54 11.06 11.11 11.34 11.56 12.04 12.33 Table type usually written as follows: Flight: (Flight#, Origin, Destination, Arrival) Supplement 01 (b) Database Introduction-2 Relational Database Terminology Flight: Key Field Tuple, Row or Record Flight# BA143 BA142 KT222 KT401 KT402 KT111 KT112 DE477 DE478 BA101 BA102 Origin NAP ROM LHR JFK DUL CCG MIA ATH CDG EDI LHR Destination ROM NAP JFK DUL JFK MIA CCG CDG ATH LHR EDI Arrival 10.15 10.22 10.34 10.45 10.54 11.06 11.11 11.34 11.56 12.04 12.33 Column, Attribute, Type or Field Intension or table type Attribute Value, Data or Value Domain = set of values drawn upon by a particular attribute Cardinality = No. of Rows in a relation Q: Which is most Degree = No. of Columns in a relation likely to change? Supplement 01 (b) Database Introduction-2 Your turn – define the following terms!! Order: Client# C1 C2 C2 C2 C3 C4 Value Domain Attribute type Attribute Value Tuple Extension Supplement 01 (b) Part# P1 P1 P2 P3 P2 P1 Qty-ordered 25 12 8 4 3 20 Row Degree of Order Cardinality of Order Table Type Table Occurrence Relation Database Introduction-2 Date 22/08/99 24/08/99 24/08/99 24/08/99 25/08/99 25/08/99 Column Intension Record Field Attribute Why Tables ? Shop: Supplement 01 (b) Shop ID 1 2 3 4 5 6 7 Area-code 2 1 1 2 3 4 1 Location Edinburgh London London Edinburgh Birmingham Ipswich London Database Introduction-2 Primary Key Data Duplication Shop: Area: Supplement 01 (b) Shop ID 1 2 3 4 5 6 7 Area-code 2 1 3 4 Area-code 2 1 1 2 3 4 1 Location Edinburgh London Birmingham Ipswich Database Introduction-2 Duplication of primary key data maintains the link (relationships) between tables (data) Normalisation is the process used to make sure tables are nonredundant Keys within the Relational Data Model • • The Primary Key is the field which uniquely identifies a record The Primary Key (Unique Identifier) concept: Example : student (student# , name , …) (the # character means 'number’) • If student# is the primary key then a particular student#, e.g. 'S4', can only occur once in that column of the table. student# S4 S2 S9 S11 Supplement 01 (b) name Ramesh Peter Anthony Priti Database Introduction-2 • The Primary Key uniquely identifies a student and thus that student can only have one row in the student table. • Breaking that rule …. student# S4 S2 S9 S11 S4 • • name Ramesh Peter Anthony Priti Fred The new row in the table does not make much sense. – The first row for 'S4' is sufficient to hold the name and we cannot allow a second row with 'S4' as the primary key value. Why? Supplement 01 (b) Database Introduction-2 • Example TASK (employee#, project#, role, supervisor, ours_allocated, hours-so-far, hours-required , …) employee# project# role supervisor hours-allocated hours-so-far hours-required E2 P9 program E123 120 85 100 E2 P4 design E101 300 250 200 E101 P9 design E101 60 128 56 E22 P11 test E345 40 0 40 • This table has a composite primary key. – The primary key is composed of the three attributes [employee#, project# , role]. • In each row, the composite of the values for these attributes must be unique. – For example, the first row has the values ['E2, P9, program'] for these attributes. – No other row is allowed to have the same combination. Supplement 01 (b) Database Introduction-2 • Breaking that rule … employee# project# role supervisor hours-allocated hours-sofar hoursrequired E2 P9 program E123 120 85 100 E2 P4 design E101 300 250 200 E101 P9 design E101 60 128 56 E22 P11 test E345 40 0 40 E2 P9 program E99 12 0 2000 • Again this makes no sense or the basic design is wrong – Perhaps an employee can be re-allocated to a project, with a different supervisor, to do more programming. – In this case the chosen primary key is wrong and needs the addition of an extra attribute such as date. • For example, [employee#, project# , role, start_date] might be an appropriate identifier. Supplement 01 (b) Database Introduction-2 Foreign Keys (Posted Identifier) concept • Example: student(student#, course#, student_name, …) course (course#. course_name, …) • course# is the identifier (primary key) of the course table. • The course# is posted into the student table and is thus called a FOREIGN KEY (or posted identifier). – Now for any student we can easily find the appropriate course# and look up futher details of that course in the course table is needed. – This is easy to do in any relational database. Supplement 01 (b) Database Introduction-2 Candidate Keys (Candidate Identifier) – A favourite example • • Employee(emp#, name, address, National_Insurance#, ..) Or Employee(National_Insurance# , name, address, emp#,..) • Both National_Insurance# and emp# can be primary keys (unique identifiers) of employee. • You choose one as the most appropriate from the two candidate (possible) keys. • You could argue that the composite [name, address] is another candidate primary key. Supplement 01 (b) Database Introduction-2 Examples - Primary and Foreign keys Shop: Visit: Shop ID Area-code 1 2 2 1 3 1 4 2 5 3 6 4 7 1 Visit# V1 V2 V3 V4 V5 Area: Area-code 2 1 3 4 Supplement 01 (b) Location Edinburgh London Birmingham Ipswich Patient# P1 P1 P2 P3 P1 Doctor# D1 D1 D1 D1 D2 Date 22/08/99 24/08/99 24/08/99 24/08/99 25/08/99 Order: Client# Client# C1 C2 C2 C2 C3 C4 Database Introduction-2 Part# Part# P1 P1 P2 P3 P2 P1 Qty-ordered Qty-ordered 25 12 8 4 3 20 Date Date 22/08/99 24/08/99 24/08/99 24/08/99 25/08/99 25/08/99 Examples - Primary and Foreign keys Employee: Marriage: Supplement 01 (b) Emp# E1 E2 E3 E4 E5 E6 Man# Man# P1 P2 P2 P3 P4 P5 Emp_name Fred Brown Eve Munsen Joyce Goldberg Paul Samuels Paul Josephs Terry Wain Women# Women# P6 P7 P8 P9 P10 P8 Database Introduction-2 Dept# Mtg R&D Admin Mtg R&D Production Date Date 22/04/94 23/08/95 24/04/97 2/01/99 5/07/99 5/08/99 Status Manager Manager G1 G4 G3 Manager Anatomy of a table - a reminder A Table Occurrence: using a variation of the Library Copies table access_no isbnx price now_price condition timesloaned 4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4 4,887,657 0-7131-3688-X £12.95 17.09.91 B1 47 6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17 9,387,263 0-6542-1212-B £15.99 14.02.91 B2 37 7,365,241 0-2435-3468-V £27.40 19.11.91 A3 7 3,874,652 0-2435-3468-V £27.40 19.11.91 A1 11 Attribute: Example: date-purchased Value: Example: 02.06.92 Table Example: COPIES(access_no, isbnx, price, now_price, condition, times-loaned) – What is special about isbnx in this table? Supplement 01 (b) Database Introduction-2 The 4 Rules for Normalised Tables [Rolland p72-] • No row order significance. • No column order significance. • No multiple values at row/column intersections. • No duplicate rows. • Snapshots of table occurrences. – When we look at a paper copy of a table remember that the data in a real database table can be expected to change all the time. – The COPIES table could have 5 rows the first time we look and on another day there could be hundreds or thousands. – Always table really has thousands of rows. Supplement 01 (b) assume a databaseDatabase Introduction-2 The 4 rules for Normalised Tables broken • No row order significance (Rule broken). access_no isbnx price now_price condition timesloaned 4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4 17.09.91 B1 47 4,887,657 6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17 9,387,263 0-6542-1212-B £15.99 14.02.91 B2 37 7,365,241 0-2435-3468-V £27.40 19.11.91 A3 7 19.11.91 A1 11 3,874,652 • If you swap the rows you lose information as copies are sometimes dependent on the row above for their ISBNX number. Supplement 01 (b) Database Introduction-2 No Column Order Significance (Rule broken). access_no • • • isbnx price condition times-loaned 4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4 4,887,657 0-7131-3688-X £12.95 17.09.91 28.07.93 B1 47 6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17 9,387,263 0-6542-1212-B £15.99 14.02.91 31.08.93 B2 37 7,365,241 0-2435-3468-V £27.40 19.11.91 A3 7 3,874,652 0-2435-3468-V £27.40 19.11.91 A1 11 The two columns with no attribute type shown are intended to indicate the datepurchased followed by the date-removed. – The date-removed column would contain a significant number of NULLS (explained later). The two columns are now dependent on the column order for their meaning. – If you move the first date column to the end of the table then the meaning is lost. Clearly having each column with its own attribute type is simpler and makes the columns order independent of each other. Supplement 01 (b) Database Introduction-2 No Multiple Values at Row/column Intersections (Rule Broken). • Or No Repeating Groups access_no isbnx price date condition timesloaned 4887642, 4887657 0-7131-3688-X £12.95 02.06.92, 17.09.91 A2, B1 4, 47 6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17 9,387,263 0-6542-1212-B £15.99 14.02.91 B2 37 7365241, 3874652 0-2435-3468-V £27.40 19.11.91, 19.11.91 A3, A1 7, 11 • This is just too complicated – it makes searching and sorting difficult. • Can you sort this into date order?? • Can it be easily searched on access_no ?? Supplement 01 (b) Database Introduction-2 No Duplicate Rows (Rule broken). access_no isbnx price date condition times-loaned 4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4 4,887,657 0-7131-3688-X £12.95 17.09.91 B1 47 6,055,432 0-7248-1045-5 £37.65 12.04.92 A2 17 9,387,263 0-6542-1212-B £15.99 14.02.91 B2 37 7,365,241 0-2435-3468-V £27.40 19.11.91 A3 7 3,874,652 0-2435-3468-V £27.40 19.11.91 A1 11 4,887,642 0-7131-3688-X £12.95 02.06.92 A2 4 • Why is this such a BAD idea? • If you allow redundantly duplicated data in a real system, what will be the end result? Supplement 01 (b) Database Introduction-2 Domains • An attribute cannot contain just any data. – For example we could have the attribute student_date_of_birth – Whilst '11-January-1980' might be a suitable value, 'FRED BLOGGS' clearly is not - the wrong data type. • So any value of student_date_of_birth at least should be a valid date. – Other rules might apply to the attribute - dates before the year 1900 seem unlikely to be useful etc. • The Domain concept carries this a bit further. – A Domain is the pool of values from which an attribute draws its actual values. Supplement 01 (b) Database Introduction-2 • We may say that the attribute student_name is of data type string. – So 'JOE BLOGGS' is a valid value. • However, 'X23Y B&&9' is also a string but is not a student name. (unless that student has particularly annoying parents). • We could argue that there the finite list of possible student names is a subset of all possible random strings. – We can't predict what a student will be called. – Thus we have to implement the domain of student_name by using the data type string. • In other cases, for example the attribute, student_eye_colour, we could easily define a list of values that defines the full range of possibilities. Supplement 01 (b) Database Introduction-2 NULLS • There are two basic types of NULL value 'not applicable' and 'not known'. emp# emp-name age car-reg# E4 D.Jones 34 F345DRT E77 L.Smith 27 E9 J.Smith E2 N.Patel G467BBT 55 Null(Not known) K976BJT Null (Not applicable) • Every employee would have an age but it might not be known in a particular case. • However, not every employee need own a car ( with a car-reg#). Supplement 01 (b) Database Introduction-2 Summary • Introduction to Database Concepts • Introduction to Data Modelling • Introduction to Databases and Redundancy. • Relational Data Model • Terminology associated with Relational Data Model. • Duplicated Data • Primary and Foreign Keys • Duplicated Data (Foreign key to Primary key references to link data in tables) but not redundant. • Additional supplementary material on Normalisation available Supplement 01 (b) Database Introduction-2 END OF LECTURE Supplement 01 (b) Database Introduction-2