Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
INFS 6220 Systems Analysis & Design Transactional DBs vs. Data Warehouses Relational Databases (RDBMS) Collection of linked tables Tables linked by Primary Key / Foreign Key relationships (Referential Integrity) Primary Key – column whose values make each record unique Foreign Key – value in column that links to Primary Key in another table SQL – Structured Query Language (language to access data in relational tables) Relational DB Example CUSTOMER TABLE Cust # 100 101 102 Primary Key Cust Name Bob Sue Juan ORDER TABLE Order # 1 2 3 Prod# QR22 QR22 SB56 Qty 1 25 3 Cust# 100 100 102 Foreign Key Database Structure & Design 2 Approaches: 1. Optimize for Data Capture i.e., Capturing Transactions Conflict 2. Optimize for Data Access i.e., Queries & Reporting Approach #1: Optimize for Data Capture To optimize for data storage, you must: • • • Eliminate redundancy of data (or else wasted space & processing occurs) Ensure data integrity (or else data anomalies) Ensure that changes in data (modifications, deletions, etc. only have to happen in one place) Normalization – process in which a DBMS is optimized for data storage • • All data “redundancy” is removed from Database Has multiple forms (0, 1st, 2nd, 3rd, et al.) Moving from 0NF to 1NF Rule: Make a separate table for each set of related attributes, and give each table a primary key of unique values. CUSTOMER TABLE ONF Cust # 100, 100, 101 CustName Bob, Sue, Juan CUSTOMER TABLE 1NF Cust # 100 101 102 Cust Name Bob Sue Juan Primary Key Created with Unique values Moving from 1NF to 2NF Rule: Eliminate any repeating values caused by a dependency on a “keyed” column (i.e., either Primary or Foreign) TABLE X 1NF Cust # 100 100 101 Cust Name Order# Bob Dependency 1 on Bob Primary Key 2 Sue 3 CUSTOMER TABLE 2NF Cust # Cust Name 100 Bob 101 Sue ORDER TABLE Order # 1 2 3 Cust# 100 100 101 Moving from 2NF to 3NF Rule: Eliminate any repeating values caused by a dependency on a “non-keyed” column (i.e., dependency on ANY column) TABLE X 2NF Cust # 100 101 102 City PGH PGH LA Order# 1 2 3 ShipTime 2 days 2 days 5 days SHIP TIME TABLE 3NF City # 10 20 City PGH LA ShipTime 2 days 5 days Dependency b/t 2 non-key columns CUSTOMER TABLE Cust # 100 101 102 City# 10 10 20 Normalized DB Example 9 Approach #2: Optimize for Data Access (in a separate, read-only Data Warehouse) To optimize for data access, you must: • Allow data redundancy • Reduce the number of table joins (links among tables) Denormalizing – Adding redundancy & reducing joins in a DBMS Denormalizing – Most Common Approach Star Schema (Clustering) • • Fact (core or transaction) Tables in middle of star Dimensional (structural or “lookup”) Tables around “points” of star REP TABLE CUSTOMER TABLE Cust # 100 101 102 CustName Bob Sue Juan DATE/TIME Date 06/29/XX 06/30/XX 07/01/XX Quarter 2 2 3 Rep # 1000 2000 3000 ORDER TABLE Order # 1 2 3 Date 06/15/XX 07/19/XX 08/30/XX Bob Sue Juan Cust# 100 100 101 Prod# QR22 QR22 SR56 Rep# 1000 1000 2000 RepName Lee James Natasha PRODUCT TABLE Prod # QR22 SR56 TW43 ProdName Rake Spade Mulch Denormalizing (continued) • Stars are linked via common (i.e., Conformed) Dimensions to form Data Warehouse REP TABLE CUSTOMER TABLE Cust # 100 101 102 CustName Bob Sue Juan CUSTOMER TABLE TIME DATE/TIME Date 06/29/XX 06/30/XX 07/01/XX Quarter 2 2 3 Rep # 1000 2000 3000 ORDER TABLE ORDER TABLE Order # 1 2 3 Date 06/15/XX 07/19/XX 08/30/XX Cust# 100 100 101 Prod# QR22 QR22 SR56 PRODUCT TABLE Prod # QR22 SR56 TW43 Bob Sue Juan INVENTORY TABLE Prod# QR22 TW43 SR56 Rep# 1000 1000 2000 ProdName Rake Mulch Spade Stock Date Units 03/23/XX 150 04/15/XX 1452 05/01/XX 997 RepName Lee James Natasha ProdName Rake Spade Mulch