Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Relational algebra wikipedia , lookup
Microsoft Jet Database Engine wikipedia , lookup
Microsoft SQL Server wikipedia , lookup
Extensible Storage Engine wikipedia , lookup
Entity–attribute–value model wikipedia , lookup
Functional Database Model wikipedia , lookup
Open Database Connectivity wikipedia , lookup
Clusterpoint wikipedia , lookup
MGS 8020 Business Intelligence The Data Warehouse & Relational Database Management System Feb 5, 2015 Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 1 Agenda Data Warehouse Georgia State University - Confidential Relational Database Designing & Building the Data Warehouse MGS8020_05.ppt/Feb 5, 2015/Page 2 The Data Warehouse The Data Warehouse • is physically separated from all other operational systems • holds aggregated data and transactional data for management separate from that data used for online transaction processing Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 3 Data Flow Operational Data Store Legacy Systems Data Warehouse Personal Data Warehouse Data Mart Metadata Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 4 The Data Warehouse The Data Warehouse • is physically separated from all other operational systems • holds aggregated data and transactional data for management separate from that data used for online transaction processing Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 5 Characteristics of a Data Warehouse • Subject Orientation • Data Integrated Consistent Naming and Measurement Attributes • Time Variant • Nonvolatility Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 6 Executive Information System Internal Source Systems Extract, Transformation and Load Business Intelligence & Data Warehouse External Data Sources Georgia State University - Confidential Dashboard Balanced Scorecard KPI Financial Metrics Data Warehouse Data Mart Business Intelligence Advanced Analytics Business Objects Cognos Microstrategy Brios Microsoft Access Etc. SAS Minitab SPSS MGS8020_05.ppt/Feb 5, 2015/Page 7 Data Warehouse Vendors • Business Objects • Cognos • Hyperion • IBM • Microsoft • NCR / Teradata • Oracle • SAS Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 8 Metadata What is Metadata? • • • Data about Data Without metadata, the data is meaningless Provides consistency of the truth Components of Metadata • • • • • • • Transformation Mapping Extraction and Relationship History Algorithms for Summarization (and calculations) Data Ownership Patterns of Warehouse Access Business Friendly naming conventions Status Information Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 9 Agenda Data Warehouse Georgia State University - Confidential Relational Database Designing & Building the Data Warehouse MGS8020_05.ppt/Feb 5, 2015/Page 10 Relational Database A relational database is a collection of data items organized as a set of formally-described tables from which data can be accessed or reassembled in many different ways without having to reorganize the database tables. The relational database was invented by E. F. Codd at IBM in 1970. The standard user and application program interface to a relational database is the structured query language (SQL). SQL statements are used both for interactive queries for information from a relational database and for gathering data for reports. A relational database is a set of tables containing data fitted into predefined categories. Each table (which is sometimes called a relation) contains one or more data categories in columns. Each row contains a unique instance of data for the categories defined by the columns. For example, a typical business order entry database would include a table that described a customer with columns for name, address, phone number, and so forth. Another table would describe an order: product, customer, date, sales price, and so forth. A user of the database could obtain a view of the database that fitted the user's needs. For example, a branch office manager might like a view or report on all customers that had bought products after a certain date. A financial services manager in the same company could, from the same tables, obtain a report on accounts that needed to be paid. Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 11 Relational Database When creating a relational database, you can define the domain of possible values in a data column and further constraints that may apply to that data value. For example, a domain of possible customers could allow up to ten possible customer names but be constrained in one table to allowing only three of these customer names to be specifiable. The definition of a relational database results in a table of metadata or formal descriptions of the tables, columns, domains, and constraints. Meta is a prefix that in most information technology usages means "an underlying definition or description." Thus, metadata is a definition or description of data and metalanguage is a definition or description of language. A database is a collection of data that is organized so that its contents can easily be accessed, managed, and updated. The most prevalent type of database is the relational database, a tabular database in which data is defined so that it can be reorganized and accessed in a number of different ways. A distributed database is one that can be dispersed or replicated among different points in a network. An object-oriented programming database is one that is congruent with the data defined in object classes and subclasses. SQL (Structured Query Language) is a standard interactive and programming language for getting information from and updating a database. Although SQL is both an ANSI and an ISO standard, many database products support SQL with proprietary extensions to the standard language. Queries take the form of a command language that lets you select, insert, update, find out the location of data, and so forth. Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 12 Internal Source Systems Extract, Transformation and Load Business Intelligence Environment External Data Sources Data Warehouse Data Mart SQL RDBMS Business Intelligence Business Objects Cognos Microsoft Access Etc. Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 13 Relational Database • IBM DB2, DB2/400 • Microsoft SQL/Server • Teradata • Oracle • Sybase • Informix / Red Brick • Microsoft Access • MySQL Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 14 Relational Database SQL Request RDBMS BI Software Application Result Set Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 15 SQL SQL – Structured Query Language 1. DDL – Data Definition Language • • • Create Drop Alter 2. DML – Data Manipulation Language • • • • Insert Update Delete Select Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 16 SQL Select Statement SELECT column1, column2, . . . FROM table1, table2, . . . WHERE AND/OR ..... criteria1 criteria2 ORDER BY column1, column1, . . . Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 17 SQL Select Statement Aggregation SELECT column1, column2, . . . FROM table1, table2, . . . WHERE AND/OR ..... criteria1 criteria2 GROUP BY column1, column1, . . . HAVING AND/OR ..... criteria1 criteria2 ORDER BY column1, column1, . . . Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 18 SQL – Example 1 SQL CREATE TABLE ADDR_BOOK ( NAME char(30), COMPANY char(20), E_MAIL char (25) Output Name Company Email John Smith Microsoft [email protected] Jeff Jones Delta [email protected] Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 19 SQL – Example 2 2a) 2b) SQL Table - Product SELECT ID Name Category NAME, I Internet A COMPANY, B Browsers A E_MAIL A Application Null FROM G Graphics Null ADDR_BOOK WHERE COMPANY = ‘Microsoft' SQL SELECT Output Name ID, Company John Smith Microsoft Email NAME [email protected] from PRODUCT WHERE CATEGORY = NULL Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 20 SQL – Example 3 SQL SELECT ADDR_BOOK.NAME, COMPANY.EMAIL FROM ADDR_BOOK, COMPANY WHERE ADDR_BOOK.EMPLOYEE_ID = COMPANY.EMPLOYEE_ID Output Name Email John Smith [email protected] Jeff Jones [email protected] Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 21 SQL – Example 4 SQL SQL CREATE TABLE CUSTOMER ( SELECT CUST_NO INTEGER, ORDER.ORDER_NO, CUSTOMER.NAME, CUSTOMER.ADDRESS, CUSTOMER.CITY, CUSTOMER.ZIP_CIDE, CUSTOMER.COUNTRY FIRST_NAME CHAR(30), LAST_NAME CHAR(30), ADDRESS CHAR(50), CITY CHAR(30), STATE CHAR (2), ZIP_CODE CHAR(9), COUNTRY CHAR(20) ) FROM ORDER, CUSTOMER WHERE ORDER.CUST_NO = CUSTOMER.CUST_NO AND CREATE TABLE ORDER ( ORDER_NO INTEGER, ORDER.DATE_ENTERED = '1998-2011' DATE_ENTERED DATE, CUST_NO INTEGER ) Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 22 SQL – Example 5 SQL CREATE TABLE ADDR_BOOK ( NAME char(30), COMPANY char(20), E_MAIL char (25) Output Name Company Email John Smith Microsoft [email protected] Jeff Jones Delta [email protected] Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 23 SQL – Example 6 – Referential Integrity SQL SQL CREATE TABLE CUSTOMER ( CREATE TABLE ORDER_ITEMS ( CUST_NO INTEGER PRIMARY KEY, ORDER_NO INTEGER, FIRST_NAME CHAR(30), ITEM_NO INTEGER, LAST_NAME CHAR(30), PRODUCT CHAR(30), ADDRESS CHAR(50), QUANTITY INTEGER, CITY CHAR(30), UNIT_PRICE MONEY ) ZIP_CODE CHAR(9), COUNTRY CHAR(20) ) CREATE TABLE ORDER ( ALTER TABLE ORDER_ITEMS ADD PRIMARY KEY PK_ORDER_ITEMS (ORDER_NO, ITEM_NO) ORDER_NO INTEGER PRIMARY KEY, DATE_ENTERED DATE, CUST_NO INTEGER REFERENCES CUSTOMER (CUST_NO) ) ALTER TABLE ORDER_ITEMS ADD FOREIGN KEY FK_ORDER_ITEMS_1 (ORDER_NO) REFERENCES ORDER (ORDER_NO) Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 24 SQL – Example 7 – Index When you have a primary key, you already have an implicitly (or explicitly) defined unique index on the primary key columns. It's generally a good idea to define non-unique indexes on the foreign keys. SQL CREATE UNIQUE INDEX PK_CUSTOMER ON CUSTOMER (CUST_NO) CREATE UNIQUE INDEX PK_ORDER ON ORDER (ORDER_NO) CREATE INDEX FK_ORDER_1 ON ORDER (CUST_NO) CREATE UNIQUE INDEX PK_ORDER_ITEMS ON ORDER_ITEMS (ORDER_NO, ITEM_NO) CREATE INDEX FK_ORDER_ITEMS_1 ON ORDER_ITEMS (ORDER_NO) Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 25 Agenda Data Warehouse Georgia State University - Confidential Relational Database Designing & Building the Data Warehouse MGS8020_05.ppt/Feb 5, 2015/Page 26 Why Business Intelligence 1. Improve consistency and accuracy of reporting 2. Reduce stress on operational systems for reporting and analysis 3. Faster access to information 4. BI tools provide increased analytical capabilities 5. Empowering the Business User 6. Companies are realizing that data is a company’s most underutilized asset Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 27 ERM vs. DM ERM - Entity Relationship Model • Remove redundancy • Efficiency of transactions DM - Dimensional Model • Intuitive View of the Data • Efficiency of access and analysis Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 28 Dimensional Model Dimension Table Dimension Table Primary_Key Descriptive_Attribute_1 Descriptive_Attribute_2 Descriptive_Attribute_3 Descriptive_Attribute_4 Descriptive_Attribute_5 Descriptive_Attribute_6 Descriptive_Attribute_7 .... Dimension Table Primary_Key Descriptive_Attribute_1 Descriptive_Attribute_2 Descriptive_Attribute_3 Descriptive_Attribute_4 Descriptive_Attribute_5 Descriptive_Attribute_6 Descriptive_Attribute_7 .... Georgia State University - Confidential Fact Table Foreign_Key_1 Foreign_Key_2 Foreign_Key_3 Foreign_Key_4 Metric_1 Metric_2 .... Star Schema Primary_Key Descriptive_Attribute_1 Descriptive_Attribute_2 Descriptive_Attribute_3 Descriptive_Attribute_4 Descriptive_Attribute_5 Descriptive_Attribute_6 Descriptive_Attribute_7 .... Dimension Table Primary_Key Descriptive_Attribute_1 Descriptive_Attribute_2 Descriptive_Attribute_3 Descriptive_Attribute_4 Descriptive_Attribute_5 Descriptive_Attribute_6 Descriptive_Attribute_7 .... MGS8020_05.ppt/Feb 5, 2015/Page 29 Retail Sales Dimensional Model (Partial) Product Dimension Table Time Dimension Table Time_Key (PK) Date Day_of_Week Week_Number Month .... Etc. Sales Fact Table Store Dimension Table Store_Key (PK) Store_ID Store_Name Address District Floor_Plan .... Etc. Georgia State University - Confidential Time_Key (FK) Product_Key (FK) Store_Key (FK) Customer_Key (FK) Units Revenue Cost ... Product_Key (PK) SKU_Number Description Brand Product_Category Size .... Etc. Customer Dimension Table Customer_Key (PK) Customer_Name Purchase_Profile Credit_Profile Demographic_Category Address .... Etc. MGS8020_05.ppt/Feb 5, 2015/Page 30 Fact Table Sales Fact Table Time_Key (FK) Product_Key (FK) Store_Key (FK) Customer_Key (FK) Units Revenue Cost ... Georgia State University - Confidential 1. Contains Foreign Keys that relate to Dimension Tables 2. Have a many-to-one relationship to Dimension Tables 3. Contains Metrics to be aggregated 4. Typically does not contain any non-foreign key or non-metric data elements 5. Level of Granularity defines depth and flexibility of analysis MGS8020_05.ppt/Feb 5, 2015/Page 31 Dimension Table Product Dimension Table Product_Key (PK) SKU_Number Description Brand Product_Category Size .... Etc. Georgia State University - Confidential 1. Contains a Primary Key that relates to the Fact Table(s) 2. Has a one-to-many relationship to the Fact Table(s) 3. Contains Descriptive data used to limit and aggregated metrics from the Fact Table(s) 4. Can sometimes contain preaggregated data MGS8020_05.ppt/Feb 5, 2015/Page 32 Warehouse Architecture Specification • Common Sources • Common Dimensions • Common Business Rules • Common Semantics • Common Metrics Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 33 Time Dimension Date (Primary Key) – a day, 365 per year Week Month Quarter Year Fiscal Month Fiscal Quarter Fiscal Year – defined by an end of week day – January, February, March, . . . – Q1: 01/01 – 03/31 Q2: 04/01 – 06/30 Q3: 07/01 – 09/30 Q4: 10/01 – 12/31 – 2000, 2001, 2002, 2003 – 4/4/5 Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 34 Time Dimension Date (Primary Key) – a day, 365 per year Weekday/Weekend Day of Week – Monday, Tuesday, Wednesday, . . . Season – Winter, Spring, Summer, Fall Holiday – Labor Day, 4th of July, Memorial Day, . . . ... Georgia State University - Confidential MGS8020_05.ppt/Feb 5, 2015/Page 35