Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Chapter 6 The Data Warehouse Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA 99223 [email protected] A/W & Dr. Chen, Data Mining 6.1 Operational Databases A/W & Dr. Chen, Data Mining Data Modeling and Normalization • One-to-One Relationships • One-to-Many Relationships • Many-to-Many Relationships A/W & Dr. Chen, Data Mining Data Modeling and Normalization • First Normal Form • Second Normal Form • Third Normal Form A/W & Dr. Chen, Data Mining Type ID Make Customer ID Year Vehicle - Type A/W & Dr. Chen, Data Mining Income Range Customer Figure 6.1 A simple entityrelationship diagram The Relational Model A/W & Dr. Chen, Data Mining Table 6.1a • Relational Table for Vehicle-Type Type ID Make Year 4371 6940 4595 2390 Chevrolet Cadillac Chevrolet Cadillac 1995 2000 2001 1997 Table 6.1b • Relational Table for Customer Customer ID Income Range ($) Type ID 0001 0002 0003 0004 0005 70–90K 30–50K 70–90K 30–50K 70–90K 2390 4371 6940 4595 2390 A/W & Dr. Chen, Data Mining Table 6.2 • Join of Tables 6.1a and 6.1b Customer ID Income Range ($) Type ID Make Year 0001 0002 0003 0004 0005 70–90K 30–50K 70–90K 30–50K 70–90K 2390 4371 6940 4595 2390 Cadillac Chevrolet Cadillac Chevrolet Cadillac 1997 1995 2000 2001 1997 A/W & Dr. Chen, Data Mining 6.2 Data Warehouse Design A/W & Dr. Chen, Data Mining Dependent Data Mart External Data Extract/Summarize Data ETL Routine Operational Database(s) (Extract/Transform/Load) Data Warehouse Independent Data Mart A/W & Dr. Chen, Data Mining Decision Support System Report Figure 6.2 A data warehouse process model Entering Data into the Warehouse • Independent Data Mart • ETL (Extract, Transform, Load Routine) • Metadata A/W & Dr. Chen, Data Mining Structuring the Data Warehouse: The Star Schema • Fact Table • Dimension Tables • Slowly Changing Dimensions A/W & Dr. Chen, Data Mining Purchase Key 1 2 3 4 5 6 . . . Purchase Dimension Category Supermarket Travel & Entertainment Auto & Vehicle Retail Restarurant Miscellaneous . . . Time Key 10 . . . Cardholder Key Purchase Key Location Key 1 2 1 15 4 5 1 2 3 . . . . . . . . . Cardholder Key Name 1 John Doe 2 Sara Smith . . . . . . A/W & Dr. Chen, Data Mining Cardholder Dimension Gender Income Range Male 50 - 70,000 Female 70 - 90,000 . . . . . . Time Dimension Month Day Quarter Year Jan 5 1 2002 . . . . . . . . . . . . Fact Table Time Key Amount 10 14.50 11 8.25 10 22.40 . . . . . . Location Key 10 . . . Street 425 Church St . . . Figure 6.3 A star schema for credit cared purchases Location Dimension City State Region Charlston SC 3 . . . . . . . . . The Multidimensionality of the Star Schema A/W & Dr. Chen, Data Mining Cardholder Ci ,1 Ci ( A Purchase Key e m i T Ke 10 , 2 , ) y Location Key A/W & Dr. Chen, Data Mining Figure 6.4 Dimensions of the fact table shown in Figure 6.3 Additional Relational Schemas • Snowflake Schema • Constellation Schema A/W & Dr. Chen, Data Mining Promotion Key 1 . . . Promotion Dimension Description Cost watch promo 15.25 . . . . . . Time Dimension Time Key Month Day Quarter Year 5 Dec 31 4 2001 8 Jan 3 1 2002 10 Jan 5 1 2002 . . . . . . . . . . . . . . . Purchase Key 1 2 3 4 5 6 Promotion Fact Table Cardholder Key Promotion Key Time Key 1 1 5 2 1 5 . . . . . . . . . Response Yes No . . . Cardholder Key Name 1 John Doe 2 Sara Smith . . . . . . A/W & Dr. Chen, Data Mining Purchase Dimension Category Supermarket Travel & Entertainment Auto & Vehicle Retail Restarurant Miscellaneous Purchase Fact Table Cardholder Key Purchase Key Location Key 1 2 1 15 4 5 1 2 3 . . . . . . . . . Cardholder Dimension Gender Income Range Male 50 - 70,000 Female 70 - 90,000 . . . . . . Time Key Amount 10 14.50 11 8.25 10 22.40 . . . . . . Location Key Street 5 425 Church St . . . . . . Figure 6.5 A constellation schema for credit card purchases and promotions Location Dimension City State Region Charleston SC 3 . . . . . . . . . Decision Support: Analyzing the Warehouse Data • Reporting Data • Analyzing Data • Knowledge Discovery A/W & Dr. Chen, Data Mining 6.3 On-line Analytical Processing A/W & Dr. Chen, Data Mining OLAP Operations • Slice – A single dimension operation • Dice – A multidimensional operation • Roll-up – Aggregation, a higher level of generalization • Drill-down – A greater level of detail the reverse of a roll-up • Rotation – View data from a new perspective A/W & Dr. Chen, Data Mining Month = Dec. Category = Vehicle Region = Two Amount = 6,720 Count = 110 Dec. Nov. Oct. Sep. Month Aug. Jul. Jun. May Apr. Mar. Feb. Jan. Category A/W & Dr. Chen, Data Mining Figure 6.6 A multidemensional cube for credit card purchases Miscellaneous Restaurant Retail Vehicle Travel Supermarket On Tw o e Fou r Thr ee n gio Re Concept Hierarchy A mapping that allows attributes to be viewed from varying levels of detail. A/W & Dr. Chen, Data Mining Region State City Street Address A/W & Dr. Chen, Data Mining Figure 6.7 A concept hierarchy for location Month = Oct./Nov/Dec. Category = Supermarket Region = One Time Q4 Q3 Q2 Q1 Category A/W & Dr. Chen, Data Mining Figure 6.8 Rolling up from months to quarters Miscellaneous Restaurant Retail Vehicle Travel Supermarket On e Tw o Fo ur Th ree R io eg n 6.4 Excel Pivot Tables for Data Analysis A/W & Dr. Chen, Data Mining Creating a Simple Pivot Table A/W & Dr. Chen, Data Mining Figure 6.9 A pivot table template A/W & Dr. Chen, Data Mining Steps 1,2 (p.198) A/W & Dr. Chen, Data Mining Steps 2, 3 A/W & Dr. Chen, Data Mining Step 3 A/W & Dr. Chen, Data Mining Step 4 A/W & Dr. Chen, Data Mining Step 5 A/W & Dr. Chen, Data Mining Step 6 A/W & Dr. Chen, Data Mining Step 7 A/W & Dr. Chen, Data Mining Result of Step 7 (p.198) A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining Figure 6.10 A summary report for income range A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining Figure 6.10 A summary report for income range Figure 6.9 A pivot table template A/W & Dr. Chen, Data Mining Step 1, 2(bottom of p.198) A/W & Dr. Chen, Data Mining Step 3 (top) and steps 1,2 3 (p.199) A/W & Dr. Chen, Data Mining Step 4 (p.199) A/W & Dr. Chen, Data Mining Step 4 (p.199) A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining Steps 1,2 A/W & Dr. Chen, Data Mining Step 2 A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining Step 3 (p.200) A/W & Dr. Chen, Data Mining Step 3 - continued (p.200) A/W & Dr. Chen, Data Mining Step 3 - continued (p.200) A/W & Dr. Chen, Data Mining Step 3 - continued (p.200) A/W & Dr. Chen, Data Mining Step 3 - result (p.200) A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining Figure 6.11 A pie chart for income range Pivot Tables for Hypothesis Testing Younger cardholders purchase credit card insurance whereas more senior cardholders do not. A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining Figure 6.12 A pivot table showing age and credit card insurance choice Method 1 A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining Figure 6.13 Grouping the credit card promotionn data by age Method 2- Steps 1,2,3 A/W & Dr. Chen, Data Mining Figure 6.14 PivotTable Layout Wizard Method 2- Step 4 A/W & Dr. Chen, Data Mining Steps 4,5 A/W & Dr. Chen, Data Mining Step 6 A/W & Dr. Chen, Data Mining Step 7 A/W & Dr. Chen, Data Mining Step 8 A/W & Dr. Chen, Data Mining Result of Method 2 The average age for credit card insurance = no is approximately 41.42, whereas the average age for credit card insurance = yes is approximately 32.33 A/W & Dr. Chen, Data Mining Creating a Multidimensional Pivot Table Investigate relationships between the magazine, watch, and life insurance promotions relative to customer gender and income range. A/W & Dr. Chen, Data Mining Watch Promo Watch Promo = No Life Insurance Promo = Yes Magazine Promo = Yes No Yes Ye s Yes No No Life Insurance Promo A/W & Dr. Chen, Data Mining Figure 6.15 A credit card promotion cube e zin a g o Ma rom P Steps 1,2,3 (p. 206) A/W & Dr. Chen, Data Mining Steps 3 (after dragging life insurance promotion to DropData Items Here. ) Continue dragging watch promotion and magazine promotion to DropData Items Here. A/W & Dr. Chen, Data Mining Step 3 (result) A/W & Dr. Chen, Data Mining Step 4 A/W & Dr. Chen, Data Mining Decision Making – steps 1-3, p.207 A total of two customers took advantage of the life insurance and magazine promotions but did not purchase the watch promotion. A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining Figure 6.16 A pivot table with page variables for credit card promotions A/W & Dr. Chen, Data Mining Result of p.207 A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining A/W & Dr. Chen, Data Mining