* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Summer Class 3
Concurrency control wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Functional Database Model wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Clusterpoint wikipedia , lookup
Business Analysis and Data Design ITEC-630 Fall 2008 Business Data Design Professor J. Alberto Espinosa Agenda • Introduction to database concepts • Data modeling & relational database design • Transitional artifacts: the CRUD matrix – linking requirements to data design • Normalization • Database queries 2 Database Concepts 3 Definitions Database: An organized collection of “logically related” data that can be retrieved on demand Database Management System (DBMS): Software that manages databases (i.e., define, create, update, and query databases) Acts as intermediary between business applications and physical data files “Most powerful, scalable, flexible and effective business applications rely on a well designed database and a powerful underlying DBMS” 4 The Old Way: Programs and Data files Data and program files were separate. You had to write individual programs to: define the data; upload it; update it; manipulate it; and or retrieve it Examples: Data Files Application Programs Accounting, Human Resources API1 System Software Windows Unix, Linux INSTRUCTION SET HARDWARE PC, Mainframe 5 A Better Way: Using a DBMS A business application passes high level instructions to the DBMS. The DBMS has capabilities to do all the necessary data management: data definition, manipulation, and retrieval. So, the business application does not have to worry about low level data management functions Examples: Database Application API2 Database DBMS Accounting, Human Resources, ERP, CRM Oracle, Access, MS SQL Server API1 System Software Windows Unix, Linux INSTRUCTION SET HARDWARE PC, Mainframe 6 Advantages of Using Databases & DBMSs • • • • • • • • Programs independent of data structure Less data redundancy Better consistency in the data More flexibility & scalability Easier to integrate & share data Easier to develop business applications Easier to enforce business rules/constraints Easier access to data by users (e.g., queries, reports, forms, etc.) 7 Stand-alone DBMS DBMS and database work in the same computer: the user’s computer OK for personal productivity Stand-alone DBMS (e.g., MS Access) Database 8 DBMS in a Client/Server Environment: Better for corporate use the DBMS has two components DBMS Server: runs the “back-end” part of the DBMS and performs most of the data management functions – e.g., queries, updates, etc. DBMS Client: runs “front-end” part of the DBMS that provides the user interface (e.g., data entry, screen displays or presentation, report formatting, query building tools) DBMS Client Data Request (e.g., query) Response (e.g., query result) DBMS Server Database Retrieve, add, delete and/or update data 9 DBMS in a Web Server Environment: Very common when there are large numbers of users and would be impractical to deploy and install a DBSM client access to the database is done through a browser (e.g., on-line purchases) Request (ex. get a price quote, place an order) Response (ex. query results with HTML-formatted product price or order confirmation notice) 10 Business to Business E-Commerce Example using XML e.g., supplier INSERT query XML Processor DBMS (e.g., MS SQL Server) XML Document (e.g., Purchase Order) Internet XML Processor e.g., buyer SELECT query XML Document (e.g., Purchase Order) DBMS (e.g., Oracle) 11 Data Warehouse “A database that stores and consolidates current and historical data from various systems (internal and external) with tools for management reporting and sophisticated analysis—i.e., Datamining” Business Intelligence 12 Most Common Database Models • • • • Hierarchical (of historical interest only) Network (of historical interest only) Relational Object Oriented (new) 13 Relational Database • For a database to be truly relational, it must comply with 12 rules defined by its inventor (Dr. E. F. Codd). • No commercially available database complies with the full set of rules, but the 12 rules are used as guidelines for sound database design. • Rule 1 states that data should be presented in tables • Rule 2 states that data must be accessible without ambiguity • We will talk more about other rules later (i.e., about entity integrity and referential integrity – stay tuned). 14 Implications about Rule 1 A relational database must have: • • • Tables: or “entities” Every table has a unique name Ex. Students, Courses Fields: or “columns”, “attributes” Every field has a unique name within the table Ex. Students (StudentID, StudentName, Major, Address) Ex. Courses (CourseNo, CouseName, CreditPoints, Description) Records: or “rows”, “tuples”, “instances” Every record is unique (has a unique field that identifies it) Ex. {“jdoe”, “John Doe”, “CS”, 5000 Forbes Ave.) Ex. {“MGMT-352-001”, “MIS”, Fall 2002, “A great course”} 15 Implications about Rule 2 Unambiguous reference Table.Field WHERE Record Search Ex. Students.StudentID refers to the StudentID field/column of the Students table Ex. Courses.CourseName WHERE CourseNo = “ITEC-630” more specifically, refers to the value in the CourseName field/column of the Courses table in the record/row in which the field/column CourseNo is “ITEC-630”, that is “System Requirements” 16 Object Oriented (OO) Databases • • • • • • OO languages + added database functionality, or Database products + added OO programming facilities Similar to relational databases “Classes” (a grouping of similar objects -- like tables) “Objects” (an instance of a class -- like records) “Object properties” (object attributes -- like fields) • Plus: – Methods (i.e., procedures or programs) Programs embedded in classes and objects – Other OO Properties (inheritance, encapsulation, etc.) 17 Terminology Equivalence ERD or Data Model OO Database Relational Database Entity Class Table Instances Objects Records Relationship Relationship Relationship Attributes Properties Fields Other Terms Used Rows, Tuples Columns 18 Database Management Systems (DBMS) Software that manages databases i.e., define, create, update, and query databases e.g., MS Access, MS SQL Server, Oracle 19 DBMS Functions and Tools • Performs 3 main functions: – Data definition (define, create databases) – Data manipulation (data entry, updates) – Data retrieval (extraction, reports, displays) • Plus additional database tools: – Data dictionary: data about the database – Visual tools: report & form design – Data modeling & database design tools – Macros and programming languages – Internet/web features, etc. • Examples: – Oracle, DB2, Visual FoxPro, MS Access & MS SQL 20 Database Design 21 Database Design Goals • • • • • Data integrity Avoid anomalies in the data No data redundancy Record the data in one place only Efficient data entry Duplicate data means having to enter the same data more than once Consistency Duplicate data can lead to inconsistencies when the data changes e.g., 2 different addresses for same client Flexibility and easy evolution East to maintain, update and add new tables 22 Database Design Issue #1: Enforce Entity Integrity 23 Entity Integrity • • Is ensuring that every record in each table in the database can be addressed (i.e., found) – this means that there each record has to have a unique identifier that is not duplicate or null (i.e., not blank) Examples: every student has an AU ID; every purchase order has a unique number; every customer has an ID Primary key (PK) helps enforce Entity Integrity: • Field(s) that uniquely identifies a record in a table (e.g., AU user ID) • Entity integrity = PK is not duplicate & not blank • PK can be: – A single field (e.g., UserID), or – Or more than one field (e.g., OrderNo, LineItem) 24 Database Design Issue #2: Enforce Referential Integrity 25 Referential Integrity • Is ensuring that the data that is entered in one table is consistent with data in other tables • Examples: purchase orders can only be placed by valid customers; accounting transactions can only be posted to valid company accounts Foreign key (FK) helps enforce referential Integrity: • A field in a table that is a PK in another table • That is, a field that “must” exist in another table • This is how referential integrity is maintained 26 Illustration: Primary and Foreign Keys PK FK PK 27 Entity, Referential Integrity PK PK PK, FK PK FK Database Schema: The structure of the database, which contain tables, views, constraints, relations, etc. – just about everything, except the data itself PK, FK 28 Other Important Keys • Candidate Keys: – Often there are more than one keys that could serve as a primary key – Example: Order, LineItem vs. Order, ProdID – Example: AU ID, SSN, AU Login ID – These are called candidate – Any candidate can be selected as the primary key • Alternative Keys: – Once a primary key has been selected from the choice of candidate keys, the other keys (not used as PKs) are referred to as “alternative keys” 29 Database Design Issue #3: Develop a Data Model or Entity-Relationship Diagram (ERD) 30 Data Model Example (Entity Relationship Diagram--ERD): Course Registration System Courses Instructors CourseNo CourseDescription Many InstructorID CreditPoints PreRequisites ClassroomNo 1 Includes InstructorID Teach 1 Relationships StudentID Enrollments Comments Entities Students Many StudentID CourseNo LastName FirstName Telephone EMailAddr Enrolls Many 1 LastName FirstName SSN Department College Major EMailAddr 31 Data Model Example (Entity Relationship Diagram--ERD): Course Registration System Cardinality 1 to Many Enrolls Entities Relationships 32 The Textbook’s ERD Notation Entities InstructorID LastName CourseNo FirstName Teach Instructors Telephone InstructorID (FK) EMail CourseDescr Courses CreditPoints PreReqs Relationships 33 Peter Chen’s ERD Notation Instructors PK Course InstructorID Teaches LastName FirstName Telephone EMail PK FK1 CourseNo CourseDescription InstructorID CreditPoints PreRequisites 34 Entity-Relationship Diagrams (ERDs) i.e., Conceptual Data Modeling • • • • Similar to a class diagram, but without methods and generalizations Data-oriented modeling method that describes the data and relationships among data entities Goal: capture meaning of the data 2 main ERD or data model constructs: – Entities and its attributes – Relationships between entities 35 Entity “An object, person, place, event or thing or which we want to record data” • • Equivalent to a table in a database Examples: instructors, students, classrooms, invoices, registration, machines, countries, states, etc. • Entity instance: a single occurrence of an entity Example: Espinosa, Kogod 39, ITEC 630 • Entities can be identified in a requirements analysis description by following the use of NOUNS 36 Relationships • Relationships describe how two entities relate to each other • Relationships in a database application can be identified following the VERBS that describe how entities are associated with one another • Examples: students enroll in courses countries have cities, etc. 37 Cardinality • Cardinality is an important database concept to describe how two entities are related • The Cardinality of a relationship describes how many instances of one entity can be associated with another entity The cardinality of a relationship between two entities has two components: – Maximum Cardinality: is the maximum number of instances that can be associated with the other entity – usually either 1 or many (the exact number is rarely used) – Minimum Cardinality: is the minimum number of instances that can be associated with the other entity – usually either 0 or 1 – Symbols: 0 1 Many • 38 Cardinality (cont’d.) • A relationship is fully described by describing the cardinality in both directions of the relationship: e.g., a client places zero (i.e., optional) or many orders and each order must relate to only one (i.e., mandatory) client. • Examples: 1 student can only park 1 (or 0) cars 1 to (0 or) 1 1 client can place (0 or ) many orders 1 to (0 or) many 1 student can enroll in (at least 1 or) many courses and a course can have (0 or) many students (0 or) many to (1 or) many 39 Example: 2 Entities, 1 Relationship Instructors PK Zero or many InstructorID Teaches LastName FirstName Telephone EMail One and only one Course PK FK1 CourseNo CourseDescription InstructorID CreditPoints PreRequisites Peter Chen’s notation & MS Visio software 40 ERD SYMBOLS (cont’d.) Note: high level conceptual models don’t show attributes, just entities Employee BioData Has 1 to 1 Maximum Cardinality (outer symbol) Employee Has Mandatory FamilyData Optional Minimum Cardinality (inner symbol) Peter Chen’s notation using Systems Architect software 41 ERD SYMBOLS (cont’d.) → Advises ← Have Advisor Student 1 to Many Maximum Cardinality Faculty 1 to Many (or None) Mandatory Teaches Minimum Cardinality Course Optional Peter Chen’s (“crow’s feet”) notation using Systems Architect software 42 Many to Many Relationships? Many to Many Orders Products Convert a Many-to-Many into 2 One-to-Many’s Orders Products 1 to Many LineItems 1 to Many Intersection Table (or None) 43 Cardinality: 1 to 1 (MS Access notation) 44 Cardinality: 1 to many (MS Access notation) 45 Steps in data modeling Modeling 1. Identify and diagram all ENTITIES 2. Add PK attributes – i.e., implement entity integrity Ensure PK’s are non-null & non-duplicates 3. Identify and diagram all RELATIONSHIPS Note CARDINALITIES (1 to 1, 1 to n, n to n) 4. Add FK attributes – i.e., implement referential integrity (this is automatic in some tools—MS Access) 5. Add remaining attributes 46 ERD Example: Course Registration System Courses (CourseNo (PK), CourseDescripition, InstructorID, CreditPoints, ClassroomNo) PreRequisites (CourseNo (PK), PreRequisiteNo (PK), Comments) Students (StudentID (PK), LastName, FirstName, SSN, Department, College, Major, EMail) Enrollment (StudentID (PK), CourseNo (PK), Comments) Instructors (InstructorID (PK), LastName, FirstName, Telephone, EMail) Classrooms (ClassroomNo (PK), ClassroomName, Building, BuildingRoomNo, Equipment, Capacity) Note: PK denotes a primary key 47 Example: Course Registration System Step 1. Draw Entities PreRequisites Course ClassRooms Enrollment Instructors Students 48 Example: Course Registration System Step 2. Add PK’s (undeline/separate with a line) PreRequisites Course Instructors CourseNo PreRequisiteNo CourseNo InstructorID Enrollment ClassRooms ClassroomNo StudentID CourseNo Students StudentID 49 Example: Course Registration System Step 3. Add Relationships (w/Cardinalities) PreRequisites PK,FK1 PK CourseNo PreRequisiteNo has Course PK Instructors Teaches CourseNo PK InstructorID Includes Assigned Enrollment ClassRooms PK ClassroomNo PK,FK1 PK,FK2 StudentID CourseNo Students Enrolls PK StudentID 50 Example: Course Registration System Step 4. Add FK’s PreRequisites PK,FK1 PK CourseNo PreRequisiteNo Assigned Course has PK CourseNo FK1 FK2 InstructorID ClassroomNo Instructors Teaches PK InstructorID Includes Enrollment ClassRooms PK ClassroomNo PK,FK1 PK,FK2 StudentID CourseNo Students Enrolls PK StudentID 51 Example: Course Registration System Step 5. Add Remaining Attributes Course PreRequisites PK,FK1 PK CourseNo PreRequisiteNo Comments Assigned Has PK CourseNo FK1 CourseDescription InstructorID CreditPoints FK2 ClassroomNo Instructors Teaches PK InstructorID LastName FirstName Telephone EMail Students ClassRooms PK Includes PK ClassroomNo ClassroomName Building BuildingRoomNo Equipment Capacity Enrollment PK,FK1 PK,FK2 StudentID CourseNo Comments Enrolls StudentID LastName FirstName SSN Department College Major EMail 52 Example: Course Registration System 53 EXAMPLE: Package Delivery Tracking System Deliveries Clients PK ClientID Deliveries Deliveries Deliveries PK PK DeliveryNo DeliveryNo PK PK DeliveryNo DeliveryNo LastName FirstName Address Telephone FK4 ClientID ClientID FK4 FK5 DriverNo FK5 DriverNo Date Status Clients Clients PK PK ClientID Trucks Trucks PK PK Trucks PK TruckNo TruckNo Packages Packages Packages Packages PackageNo PK PK PK PackageNo PackageNo PackageNo FK4 DeliveryNo FK4 DeliveryNo Size Charge Drivers Drivers Drivers PK PK DriverNo DriverNo PK DriverNo Drivers FK1 TruckNo TruckNo PK DriverNo Make Model Year FK1 TruckNo DriverName LicenseNo 54 Example: Package Delivery Tracking System 55 EXAMPLE: Airline Reservation System 56 Example: Airline Reservation System 57 Database Design Issue #4: Implement Important Rules: Update, Delete and Business Rules 58 Referential Integrity: Update Rules What can be updated/modified in the database and when? 1. It is OK to update values in any non-PK fields, provided that referential integrity and business rules are respected 2. It is OK to update values in the PK in one table if it is not linked to a FK in another table, provided that entity integrity, referential integrity and business rules are respected 3. If a PK is linked to a FK in another table, we need to ensure that referential integrity is maintained. Depending on what makes business sense, the update rule can be either: • U:R (Update:Restrict) – i.e., Disallow updates of values in the PK, or • U:C (Update:Cascade) – i.e., Allow updates, but cascade changes to all related FKs in other tables 59 Referential Integrity: Delete Rules What can be deleted in the database and when? 1. It is OK to delete records in a table [only] if its PK is not linked to a FK in another table 2. If its PK is linked to a FK in another table, we need to ensure that referential integrity is maintained. Depending on what makes business sense, the delete rule can be either: • D:R (Delete:Restrict) – i.e., Disallow deletion of records, or • D:C (Delete:Cascade) – i.e., Allow deletion of records, but cascade deletions in all related tables that contain a FK linked to this table 60 Illustration of Update and Delete Rules Instructors PK InstructorID LastName FirstName Telephone EMail Course PK FK1 CourseNo CourseDescription InstructorID CreditPoints PreRequisites What happens if (how is referential integrity affected): • We change an instructor’s last name? • We change an InstructorID in the Course table? • We change an InstructorID in the Instructors table? • We delete a course? • We delete an instructor 61 Other Data Integrity: Business Rules Most DBMS have features that allow you to impose constraints in the data to meet rules imposed by a company when conducting business: i.e., “business rules” – examples: • CustomerAge >= 18 • OrderQty >= 100 • ProductPrice <= 1000 • PaymentDate <= PurchaseDate + 90 62 Database Design Issue #5: “Normalize” Your Design (we will discuss this later) 63 Database Queries 64 DBMS Functions and Tools • Performs 3 main functions: – Data definition (create databases) – Data manipulation (enter & update data) – Data retrieval (data extraction) • Using Queries (or proprietary features) Plus additional database tools: – Data dictionary: data about the database – Visual tools: report & form design – Data modeling & database design tools – Macros and programming languages – Internet/web features, etc. 65 Queries Often thought of as a method to retrieve data, but queries can also be used to define and manipulate data Databases can be queried in many ways: • Proprietary DBMS commands and languages, which are unique to the particular DBMS product, or • Standard query methods/languages (QBE, SQL, etc.), which most DBMS products support 66 Standard Query Methods Query by Example (QBE) (Design View in MS Access) • Visual interface using examples of data requested • Similar to how you do searches in the library Structured Query Language (SQL) • Popular with power users • Works in most DBMS • Can embed SQL commands in programs, web scripts, etc. • English-like commands, practical • Exact, mathematical: relational algebra & matrix math 67 Query by Example (QBE) • • • Called Query “Design View” in MS Access Column labels are the fields we want to retrieve In table cells we enter “examples” of the info we want 68 Structured Query Language (SQL) 69 SQL Commands Types Only 8 Commands!! • Data Definition: Create, Drop • Data Manipulation: Insert, Update, Delete, Union, Join • Data Retrieval: Select 70 SQL Commands: Data Definition Example: Create & Delete Table called “Employees” One SQL Command (from CREATE to ;) CREATE TABLE Employees (EmployeeID integer, LastName char(24), Fields FirstName char(24), created in Employees Birthday date, table Phone char(10), Notes memo); ; = End of SQL Command DROP TABLE Employees; 71 SQL Commands: Data Manipulation • INSERT: Add new records • UPDATE: Modify existing records • DELETE: Delete records • UNION: Combine records from two tables • JOIN: Combine columns from two tables 72 SQL Commands: Data Manipulation Add & Update Records Insert (add) a complete record (values in all fields): INSERT INTO Employees VALUES (“ae”, “Espinosa”, “Alberto”, 12/12/2002, “885-1958”, “Looks tired, needs a vacation”); Insert (add) partial record (values in some fields only): INSERT INTO Employees (EmployeeID, LastName, FirstName) VALUES (“ae”, “Espinosa”, “Alberto”); Update (modify) record with new values: UPDATE Employees SET LastName=“Espinosa” WHERE EmployeeID = “ae”; 73 Data Extraction Queries: The Idea • Organize database (design, create): – In the most efficient & consistent way (internally) – Not based on how you want the data to look • Produce the “virtual” temporary tables the way you want them to look using queries How we store the data How we display the data 74 Data Extraction in SQL: The “SELECT” Command Note BOLDFACE denotes SQL Keywords SELECT field1, field2, etc. – columns to retrieve and display FROM table1, table2, etc. – tables that contain the data WHERE condition1 – which records to retrieve AND [OR] condition2 ……. – further conditions GROUP BY field2, ….. – to group results HAVING condition3 – like WHERE but after grouping ORDER BY field1, field2, etc.... [DESC] [ASC] – to sort the query results SELECT can be followed by: DISTINCT (SELECT DISTINCT eliminates duplicate rows from result) TOP n (lists only the top n rows of result – e.g. SELECT Top 5) * (lists all fields in the table – e.g., SELECT * FROM …) 75 Complexity of SELECT Queries • Simple Queries: Involve a single table • Queries with Aggregate Functions: When we only want averages, totals, etc. • Queries with Aggregate Functions and Grouping: When we want averages, totals, etc. categorized by groups • Complex Queries with Joins: Involve more than one table • Complex Nested Queries: Sub-queries (which compute something) within queries 76 Simple SQL SELECT Queries SELECT ProdID, ProdName, Type, Price (a list of fields) FROM Products (the table where the data resides) WHERE Price>=300; (which rows to display) SELECT ProdName, Price FROM Products WHERE Price>=120 AND Type=“Percussion”; Note: the SQL DELETE command works identically to the SELECT command, but instead of displaying the results, it deletes them. For example, the following DELETE command deletes all the records displayed with the previous query (it is not a bad idea to view the records before you delete them) DELETE ProdName, Price FROM Products WHERE Price>=120 AND Type=“Percussion”; Note: an SQL query can not only contain a list of fields, but also any expression involving on or more fields. For example: SELECT Labor, Parts, Labor+Parts AS Charges (column name) FROM Repairs WHERE Labor+Parts>=300; 77 Delimiters When writing WHERE conditions and similar statements in SQL, one often needs to compare a field with a particular value. The values need to be written within delimiters that match the data type. These are the delimiters: • Text – quotes: ex. WHERE UserID = “alberto” • Date/time – pound sign ex. WHERE OrderDate > #01/07/2008# • Number – nothing ex. WHERE Amount > 200 78 SQL Queries With Aggregate Functions • These queries yield a single number result (i.e., a table with 1 column and 1 row) • The only thing you can include in the SELECT line are the fields you are aggregating Aggregate functions you can use: Avg, Sum, Min, Max, Count • • These functions aggregate vertically a column of (usually numeric) values (e.g., salaries, payment amounts, etc.) SELECT Avg(Price) AS AvgPrice FROM Products WHERE Price>=120 AND Type=“Percussion”; Note: the AS clause is optional; it does not change the query results; it only changes the column label in the results SELECT Max(Price) AS MaxPrice, Avg(Price) AS AvgPrice FROM Products WHERE Type=“Guitars”; SELECT Count(*) as TotOrders FROM Orders WHERE OrderStatus = “Top Priority” Note: you can use more than one aggregate function in one SELECT command Note: the Count function counts how many rows meet the Where criteria, so it you can use any column you wish to count and you will get the same results – the easiest thing is to use Count(*) 79 SQL Queries With Aggregate Functions and Grouping • The ONLY things you can include in the SELECT line are: (1) the fields you are aggregating [e.g., Avg(Price)] (2) and the fields you are using to group [e.g, Type] SELECT Type, Avg(Price) AS AvgPrice, Max(Price) AS MaxPrice FROM Products Note: the WHERE WHERE Price>=1000 clause is evaluated BEFORE the grouping GROUP BY Type SELECT Type, Avg(Price) AS AvgPrice, Max(Price) as MaxPrice FROM Products Note: the HAVING GROUP BY Type clause is evaluated HAVING Avg(Price)>1000 AFTER the grouping 80 Complex SELECT Queries with Joins Tables: Orders (OrderNo, ClientID, OrderDate, OrderStatus) LineItems (OrderNo, LineItem, ProdID, Qty) Table Join (2 ways): SELECT Orders.OrderNo, OrderStatus, ClientID, LineItem, ProdID, Qty FROM Orders, LineItems WHERE Orders.OrderNo = LineItems.OrderNo; Join Condition Table Product (WRONG!! Don’t forget the join condition): SELECT Orders.OrderNo, ClientID, LineItem, ProdID, Qty FROM Orders, LineItems; 81 Complex SELECT Queries with Joins: TIPS COMPLEX queries that JOIN 2 tables are identical to SIMPLE queries, except for 2 additional rules you MUST ALWAYS apply: 1. The two tables need to be JOINED through the common field that links them e.g., WHERE Orders.OrderNo = LineItems.OrderNo 2. ANY time you refer to a COMMOND FIELD that exists in both tables, you must use a TABLE PREFIX to eliminate the ambiguity e.g., SELECT Orders.OrderNo; WHERE Orders.OrderNo = 990001 For complex queries that JOIN 3 or more tables apply rule 1 for EACH link, and always apply rule 2 – e.g., SELECT Clients.ClientID, ClientName, Orders.OrderNo, OrderStatus, LineItem, ProdID, Qty FROM Clients, Orders, LineItems WHERE Clients.ClientID = Orders.ClientID AND Orders.OrderNo = LineItems.OrderNo 82 Mapping QBE to SQL SELECT FROM JOIN CONDITION WHERE ORDER BY 83 Nested Queries w/Aggregates Display above average quantities in line items – i.e., order number, line item, product ID and quantity for any line item in which the quantity ordered is above the average quantity ordered in all orders: SELECT OrderNo, LineItem, ProdID, Qty FROM LineItems WHERE Qty>(SELECT Avg(Qty) FROM LineItems) Tip: prepare the sub-query first: SELECT Avg(Qty) FROM LineItems 84 Nested Queries w/Aggregates Display product ID’s and total quantities ordered for that product for totals exceeding 1000 units (2 solutions): SELECT ProdID, TotQty FROM (SELECT ProdID, Sum(Qty) AS TotQty FROM LineItems GROUP BY ProdID) WHERE TotQty > 1000 Tip: again, prepare the sub-query first: SELECT ProdID, Sum(Qty) AS TotQty FROM LineItems GROUP BY ProdID Alternative solution without sub-query: SELECT Prod ID, Sum(Qty) AS TotQty FROM LineItems GROUP BY ProdID HAVING Sum(Qty) > 1000 85 Nested Queries w/Lists Produce Sub-Query First – e.g., a single-column table (list) SELECT PartNumber FROM Shipments WHERE SupplierID = "S5“ GROUP BY PartNumber HAVING Avg(Qty)>200; Then enclose the Sub-Query in parenthesis and use with IN keyword SELECT DISTINCT PartName FROM Parts WHERE PartNumber IN (SELECT PartNumber FROM Shipments WHERE SupplierID = "S5" GROUP BY PartNumber HAVING AVG(Qty)>200); Another example: SELECT DISTINCT PartName FROM Parts WHERE PartNumber IN (352, 353, 354) 86 Database Design Issue #5: “Normalize” Your Design 89 Database Design Goals • Data integrity (Entity and Referential Integrity – ERD’s) Avoid anomalies in the data • No data redundancy Record the data in one place only • Efficient data entry Duplicate data means having to enter the same data more than once • Consistency Duplicate data can lead to inconsistencies when the data changes e.g., 2 different addresses for same client • Flexibility and easy evolution East to maintain, update and add new tables Normalization 90 Why Normalization? • Question: if a data model/ERD is sound and all entity integrity, referential integrity, update/delete and business rules have been well implemented, does this guarantee a good database design? Answer: not necessarily. If your design is not “normalized”, you could have redundant data, and that would be a BAD thing (design) • Normalization should yield the most efficient way to organize and record the data internally—not necessarily how users want to see the data, but what makes more sense for non-redundant data storage • We can later build user table views (i.e., what the user wants or needs to see) by querying these normalized tables. • Redundancy: only PK and FK (e.g., client ID’s) values should appear in multiple tables (because they are needed to link tables) Non-key data (e.g., client last name) that appears in multiple tables is “redundant” 91 Example You gather requirements from users and one user gives you this table and tell you that she would like the system to collect this data. How would you organize this data internally in the database? 92 Normalization • Normalization = The systematic process of “decomposing” a set of unorganized tables with redundant data into smaller, simpler, and more organized tables with only minimal data redundant in key fields and no data redundancy on non-key fields — i.e., from chaos to order Decompose to most efficient internal organization You can always recover the original data format with a query Decomposition Query 93 Degree of Normalization • Normalization is a matter of degree -- the more normalized your design is, the lower the chances of having redundant data • Normal Forms (NF) (higher NF designs are more normalized): 1NF 2NF 3NF BCNF PJNF DKNF 4NF 5NF • The process of normalizing a design to 3NF may seem complex, but the concept is very simple: (1) Minimize data redundancy in key attributes -- i.e., data in key fields can be entered in more than one table (2) Eliminate data redundancy in non-key attributes -- i.e., data in non-key fields should be entered only in one table (3) Ensure that every piece of data (each non-key attribute) can be unambiguously located by its PK (4) Each incremental NF gets us a step closer in this direction 94 Normal Forms To what extent is a database normalized? • Normalization is a matter of degree • Measured in what is called “normal forms” (NF) • 1NF, 2NF, 3NF, etc., higher NF = more normalized • 3NF Good enough for most applications • BCNF Boyce-Codd NF (more robust version of 3NF) Mostly of academic interest (and complex applications): • 4NF, 5NF or PJNF (Project Join), DKNF (Domain-Key) More advanced theoretically, little practical use Useful for research and formal methods only 95 Q: What’s wrong with this table? A: Data in PayDate & Amount fields not single-valued —i.e., they have repeating values 96 Similar Table, Same Problem A: repeating values for a PK value PK is duplicate 97 First Normal Form (1NF) • • • A “TABLE” is in 1NF if there are no multi-valued attributes and no PK is duplicated i.e., attributes are “atomic” A “DATABASE” is in 1NF if ALL its tables are in 1NF 98 Decomposition to 1NF: Create a separate table where the repeating values can be recorded as rows 99 Decomposition 100 Q: What’s wrong with this table? A: Some data in the Client and OrderDate fields are entered twice i.e., some non-key data are redundant i.e., there are “partial dependencies” in the table (see next slide) 101 Functional Dependencies • An attribute B is functionally dependent on attribute A if the value of a valid instance of attribute A uniquely determines the value of attribute B • Represented as: A B 102 Functional Dependency Examples StudentID StudentID StudentName StudentMajor What are the functional dependencies in this relations? Clients (ClientID, ClientName, City, State, Zip) LineItems (OrderNo, LineItem, ClientID, ProdID, Qty) 103 Second Normal Form (2NF) • Applies to tables with “composite” PKs (i.e., PK has more than one attribute) • A “TABLE” is in 2NF if (1) it is in 1NF, and (2) non-key attributes are functionally dependent on the whole PK, not on just part of it (i.e., no partial dependencies) • Note: we only need to worry about 2NF when PK contains more than one attribute (i.e., “composite”) • That is: if a table is in 1NF and has a single PK, it is automatically in 2NF • A “DATABASE” is in 2NF if ALL its tables are in 2NF 104 Decomposition to 2NF Move the partial key (e.g., OrderNo) and the fields that are functionally dependent on only that part of the key (e.g., ClientID, OrderDate) to a separate table and make that partial key the PK in that new table 105 Decomposition 106 Q: What’s wrong with this table? A: Some of the data in the ClientCity field is redundant, because once we know who the ClientID is, we know the city where they live i.e., there are “transitive dependencies” in the table 107 Transitive Dependencies • If a non-key attribute C is functionally dependent on another non-key attribute B (BC) and B is in turn dependent on the PK attribute A (AB) this implies C is transitively dependent on A (AC) (through B or ABC), which will cause redundancies • In 2NF, all non-key attributes are functionally dependent on the PK • Thus, in a 2NF table, a transitive dependency will occur every time there is a functional dependency between any two non-key attributes. 108 Transitive Dependency Examples OrderNo CourseNo ClientID InstructorID ClientName InstructorName Are there transitive dependencies in these relations? LineItems (OrderNo, LineItem, ProdID, Qty) LineItems (OrderNo, LineItem, ProdID, ProdName, Qty) 109 Third Normal Form (3NF) • A “TABLE” is in 3NF if (1) it is in 2NF and (2) non-key attributes depend on the PK and nothing else • That is, non-key attributes are NOT functionally dependent on other non-key attributes (just on the PK) • In other words, there are no transitive dependencies • A “DATABASE” is in 3NF if ALL its tables are in 3NF 110 Decomposition to 3NF: Move the fields with transitive dependencies to a separate table 111 Decomposition 112 In Summary • 1NF = no multi-value attributes (or no PK duplicates) • 2NF = 1NF + the “whole” PK, not just part of it • 3NF = 2NF + the PK and “nothing but” the PK • Important! it is OK to have non-normalized designs, and some database applications may actually require a non-normalized design, but you must have an understanding of which normalization form you are violating and a good reason for doing it 113 Exercises Text p.203, problem 3: Indicate the normal form (PK underlined) and decompose to 3NF Class (CourseNo, SectionNo, RoomNo) Class (CourseNo, SectionNo, RoomNo, Capacity) Class (CourseNo, SectionNo, CourseName, RoomNo, Capacity) 114 Exercises POS System: Indicate the normal form (PK underlined) and decompose to 3NF Sales (SaleNo, ClientID, ClientName, SaleDate, SaleAmount) SalesDetails (SaleNo, LineItem, SaleDate, ProdID, ProdName, Qty) Other Systems: VideoRental (VideoNo, Date, MovieID, MovieName, ClientID) VideoRental (VideoNo, Date, ClientID, CheckoutDate, RentalDays) Videos (VideoNo, MovieID, MovieName, MovieType) Videos (VideoNo, MovieID, VideoCondition) Movies (MovieID, MovieName, MovieType, Producer, ReleaseDate) 115 Exercise Indicate the normal form and decompose to 3NF 116 FYI, Conceptually, normalization can be thought of the opposite of a SELECT SQL query. When you normalize, you decompose a large table into simpler, smaller tables without redundancies. In contrast, when you query several small tables, the result is a larger table in which redundancies don’t matter. For example, the decomposed tables of the exercise in the prior page can be reconstructed by querying the normalized tables as follows: SELECT Companies.CompanyID, CompanyName, Employees.EmployeeID, EmployeeName, Departments.DeptID, DeptName FROM Departments, Companies, Employees WHERE Companies.CompanyID = Employees.CompanyID AND Departments.DeptID = Employees.DeptID 117 Exercise Indicate the normal form and decompose to 3NF (and then try to write an SQL query to re-construct the original table) 118 FYI Only • Boyce-Codd Normal Form (BCNF): – A more robust version of 3NF – A database is in BCNF when the database is in 3NF when you substitute the PK with any other Alternative Key – That is, the database is in 3NF for all Candidate Keys • Domain-Key Normal Form (BKNF): – All values entered in an attribute satisfy the constraints defined in the domain of that attribute – An attribute’s domain is the pool of data from which the attribute can draw its values – Example: if we define a constraint for the OrderID attribute (e.g., 6 digits, from 000001 to 999999) in general (i.e., the domain), the OrderID attribute in every table that uses this attribute, must satisfy the same constraints. 119 Transitional Artifact: The CRUD Matrix Connecting Data Objects to Use Cases Data Objects • A data object is a person or thing you want to collect data for: • In a database application a data object is a table Examples: courses, students, clients, invoices, orders, deliveries 121 Identifying Data Objects To identify data objects, refer to the Use Cases (or other requirements artifacts) and: • Identify and highlight (or bold face) all nouns • Inspect these nouns to see if they represent possible system data objects • But be careful, a noun may not refer to a data object, but simply to an attribute of a data object • A data object maps to a class (in a class diagram), entity (in a data model) or table (in a database) • A data object has attributes (and behaviors if object is for a class) • An attribute is something you want to record about a data object • For example, in Students (StudentID, Name, SSN, Email)—Students represents a data object and the data inside the parenthesis represents attributes of that data object 122 The CRUD Matrix • A “transitional artifact” is one that helps establish a relationship or cross reference between artifacts • A CRUD matrix is a transitional artifact between Use Cases and Data Objects • Helps ensure that the Use Cases specified have all the necessary Data Objects to handle the data needs of the application and, conversely, that the collection of Data Objects identified cover the entire functionality specified in the requirements. • The Use Cases, if properly specified, must describe all the actions necessary to maintain all data objects • A CRUD matrix is a table that cross references which Use Cases: (C)reate, (R)ead, (U)pdate and/or (D)elete data in these objects 123 Developing a CRUD Matrix • The CRUD matrix has one row for every data object identified and one column for every Use Case specified (or the other way around) • So, first create a column (or row) for every Use Case in your model • Every noun highlighted in the Use Cases will suggest the need for data object to store the respective data you, so you need to create a row (or column) for each of these data objects. • Then go through every cell in the first Use Case and enter a C, R, U and/or D on the cell depending on whether the Use Case is creating, reading, updating or deleting records in the respective data object. • The data objects should give you an indication of the entities (i.e., database tables) that you will need in your Data Model (and database) • And the C’s, R’s, U’s and D’s should give you an idea of the SQL queries that your application will need 124 Illustration Table 1 Table 2 Table 3 UC-101 UC-102 C R UC-103 U D • UC-102 reads data from Table 1 It will require an SQL SELECT query • UC-101 creates a record in Table 1 It will require an SQL INSERT query • UC-103 deletes records data from Table 3 It will require an SQL DELETE query • UC-102 updates data in Table 2 It will require an SQL UPDATE query 125 CRUD Matrix Example for a Loan Processing Application Use Case Data Object Submit a Loan Request Evaluate a Loan Request Applicant C Loan Application C R Credit Score C R Credit Report C R Account History C R Loan Request C R,U Loan Officer R Evaluation C Book a Loan R R Loan Agreement R Loan Account C Loan Clerk R In a database application, these are tables and these are queries 126 ATM Application Example 127 ATM Use Case Use Case ID UC-100 Use Case Withdraw Funds Actors (P) Customer Description The customer inserts card in the ATM, logs in with a pass code, and makes a selection from the available choices to withdraw funds. Once in the funds withdrawal screen, the customer is prompted to enter the amount to withdraw. After the amount is entered, the system will check for availability of funds for that customer. Provided that funds are available, the system will dispense the amount requested in cash and then debit that amount from the customer’s bank account. The system will record the last withdrawal date in customer’s file and record transaction in ATM transaction log . Priority Non-Functional Requirements Assumptions Source 128 ATM Use Case Use Case ID UC-101 Use Case Deposit Funds Actors (P) Customer Description The customer inserts card in the ATM, logs in with a pass code, and makes a selection from the available choices to deposit funds. Once in the funds deposit screen, the customer is prompted to enter the amount to deposit. After the amount is entered, deposit slot door opens, customer places deposit envelop in slot, deposit slot door closes. The system credits the customer’s account accordingly, records the last deposit date in the customer’s file and record the transaction in ATM transaction log. Priority Non-Functional Requirements Assumptions Source 129 ATM Use Case Use Case ID UC-102 Use Case Transfer Funds Actors (P) Customer Description The customer inserts card in the ATM, logs in with a pass code, and makes a selection from the available choices to transfer funds. Once in the funds transfer screen, the customer is prompted to enter the amount to transfer, from account and to account. After the information is entered, the checks for availability of funds. If funds are available, it displays the transaction and asks for confirmation. The customer confirms transaction and the customer’s account gets adjusted accordingly. The system records the last funds transfer date in the customer’s file and records the transaction in ATM transaction log. Priority Non-Functional Requirements Assumptions Source 130 ATM Use Case Use Case ID UC-103 Use Case Balance Inquiry Actors (P) Customer Description The customer inserts card in the ATM, logs in with a pass code, and makes a selection from the available choice to inquire balances. The machine prints balances, records the last balance inquiry date in the customer’s file and records the transaction in ATM transaction log . Priority Non-Functional Requirements Assumptions Source 131 ATM System’s CRUD Matrix Use Case Deposit Funds Transfer Funds Inquire Balances C U U U Customer File R,U R,U R,U R,U Customer Account R,U U R,U R C U U Data Object ATM ATM Transaction Log Customer Transactions Withdraw Funds R,U 132