Download Roiger_DM_ch06 - Gonzaga University

Document related concepts
no text concepts found
Transcript
Chapter 6
The Data Warehouse
Jason C. H. Chen, Ph.D.
Professor of MIS
School of Business Administration
Gonzaga University
Spokane, WA 99223
[email protected]
A/W & Dr. Chen, Data Mining
6.1 Operational Databases
A/W & Dr. Chen, Data Mining
Data Modeling and
Normalization
• One-to-One Relationships
• One-to-Many Relationships
• Many-to-Many Relationships
A/W & Dr. Chen, Data Mining
Data Modeling and
Normalization
• First Normal Form
• Second Normal Form
• Third Normal Form
A/W & Dr. Chen, Data Mining
Type ID
Make
Customer ID
Year
Vehicle - Type
A/W & Dr. Chen, Data Mining
Income Range
Customer
Figure 6.1 A simple entityrelationship diagram
The Relational Model
A/W & Dr. Chen, Data Mining
Table 6.1a • Relational Table for Vehicle-Type
Type ID
Make
Year
4371
6940
4595
2390
Chevrolet
Cadillac
Chevrolet
Cadillac
1995
2000
2001
1997
Table 6.1b • Relational Table for Customer
Customer
ID
Income
Range ($)
Type ID
0001
0002
0003
0004
0005
70–90K
30–50K
70–90K
30–50K
70–90K
2390
4371
6940
4595
2390
A/W & Dr. Chen, Data Mining
Table 6.2 • Join of Tables 6.1a and 6.1b
Customer
ID
Income
Range ($)
Type ID
Make
Year
0001
0002
0003
0004
0005
70–90K
30–50K
70–90K
30–50K
70–90K
2390
4371
6940
4595
2390
Cadillac
Chevrolet
Cadillac
Chevrolet
Cadillac
1997
1995
2000
2001
1997
A/W & Dr. Chen, Data Mining
6.2 Data Warehouse Design
A/W & Dr. Chen, Data Mining
Dependent
Data Mart
External
Data
Extract/Summarize Data
ETL Routine
Operational
Database(s)
(Extract/Transform/Load)
Data
Warehouse
Independent
Data Mart
A/W & Dr. Chen, Data Mining
Decision Support System
Report
Figure 6.2 A data warehouse process
model
Entering Data into the
Warehouse
• Independent Data Mart
• ETL (Extract, Transform, Load Routine)
• Metadata
A/W & Dr. Chen, Data Mining
Structuring the Data
Warehouse: The Star Schema
• Fact Table
• Dimension Tables
• Slowly Changing Dimensions
A/W & Dr. Chen, Data Mining
Purchase Key
1
2
3
4
5
6
.
.
.
Purchase Dimension
Category
Supermarket
Travel & Entertainment
Auto & Vehicle
Retail
Restarurant
Miscellaneous
.
.
.
Time Key
10
.
.
.
Cardholder Key Purchase Key Location Key
1
2
1
15
4
5
1
2
3
.
.
.
.
.
.
.
.
.
Cardholder Key Name
1
John Doe
2
Sara Smith
.
.
.
.
.
.
A/W & Dr. Chen, Data Mining
Cardholder Dimension
Gender Income Range
Male
50 - 70,000
Female
70 - 90,000
.
.
.
.
.
.
Time Dimension
Month Day Quarter Year
Jan
5
1
2002
.
.
.
.
.
.
.
.
.
.
.
.
Fact Table
Time Key Amount
10
14.50
11
8.25
10
22.40
.
.
.
.
.
.
Location Key
10
.
.
.
Street
425 Church St
.
.
.
Figure 6.3 A star schema for credit
cared purchases
Location Dimension
City
State Region
Charlston
SC
3
.
.
.
.
.
.
.
.
.
The Multidimensionality of the
Star Schema
A/W & Dr. Chen, Data Mining
Cardholder Ci
,1
Ci
(
A
Purchase
Key
e
m
i
T
Ke
10
,
2
,
)
y
Location Key
A/W & Dr. Chen, Data Mining
Figure 6.4 Dimensions of the fact
table shown in Figure 6.3
Additional Relational Schemas
• Snowflake Schema
• Constellation Schema
A/W & Dr. Chen, Data Mining
Promotion Key
1
.
.
.
Promotion Dimension
Description
Cost
watch promo 15.25
.
.
.
.
.
.
Time Dimension
Time Key Month Day Quarter Year
5
Dec
31
4
2001
8
Jan
3
1
2002
10
Jan
5
1
2002
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Purchase Key
1
2
3
4
5
6
Promotion Fact Table
Cardholder Key Promotion Key Time Key
1
1
5
2
1
5
.
.
.
.
.
.
.
.
.
Response
Yes
No
.
.
.
Cardholder Key Name
1
John Doe
2
Sara Smith
.
.
.
.
.
.
A/W & Dr. Chen, Data Mining
Purchase Dimension
Category
Supermarket
Travel & Entertainment
Auto & Vehicle
Retail
Restarurant
Miscellaneous
Purchase Fact Table
Cardholder Key Purchase Key Location Key
1
2
1
15
4
5
1
2
3
.
.
.
.
.
.
.
.
.
Cardholder Dimension
Gender Income Range
Male
50 - 70,000
Female
70 - 90,000
.
.
.
.
.
.
Time Key Amount
10
14.50
11
8.25
10
22.40
.
.
.
.
.
.
Location Key Street
5
425 Church St
.
.
.
.
.
.
Figure 6.5 A constellation schema for
credit card purchases and promotions
Location Dimension
City
State Region
Charleston
SC
3
.
.
.
.
.
.
.
.
.
Decision Support: Analyzing
the Warehouse Data
• Reporting Data
• Analyzing Data
• Knowledge Discovery
A/W & Dr. Chen, Data Mining
6.3 On-line Analytical
Processing
A/W & Dr. Chen, Data Mining
OLAP Operations
• Slice – A single dimension operation
• Dice – A multidimensional operation
• Roll-up – Aggregation, a higher level of
generalization
• Drill-down – A greater level of detail
 the reverse of a roll-up
• Rotation – View data from a new perspective
A/W & Dr. Chen, Data Mining
Month = Dec.
Category = Vehicle
Region = Two
Amount = 6,720
Count = 110
Dec.
Nov.
Oct.
Sep.
Month
Aug.
Jul.
Jun.
May
Apr.
Mar.
Feb.
Jan.
Category
A/W & Dr. Chen, Data Mining
Figure 6.6 A multidemensional cube
for credit card purchases
Miscellaneous
Restaurant
Retail
Vehicle
Travel
Supermarket
On
Tw
o
e
Fou
r
Thr
ee
n
gio
Re
Concept Hierarchy
A mapping that allows attributes to be
viewed from varying levels of detail.
A/W & Dr. Chen, Data Mining
Region
State
City
Street Address
A/W & Dr. Chen, Data Mining
Figure 6.7 A concept hierarchy for
location
Month = Oct./Nov/Dec.
Category = Supermarket
Region = One
Time
Q4
Q3
Q2
Q1
Category
A/W & Dr. Chen, Data Mining
Figure 6.8 Rolling up from months to
quarters
Miscellaneous
Restaurant
Retail
Vehicle
Travel
Supermarket
On
e
Tw
o
Fo
ur
Th
ree
R
io
eg
n
6.4 Excel Pivot Tables for Data
Analysis
A/W & Dr. Chen, Data Mining
Creating a Simple Pivot Table
A/W & Dr. Chen, Data Mining
Figure 6.9 A pivot table template
A/W & Dr. Chen, Data Mining
Steps 1,2 (p.198)
A/W & Dr. Chen, Data Mining
Steps 2, 3
A/W & Dr. Chen, Data Mining
Step 3
A/W & Dr. Chen, Data Mining
Step 4
A/W & Dr. Chen, Data Mining
Step 5
A/W & Dr. Chen, Data Mining
Step 6
A/W & Dr. Chen, Data Mining
Step 7
A/W & Dr. Chen, Data Mining
Result of Step 7 (p.198)
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Figure 6.10 A summary report for
income range
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Figure 6.10 A summary report for
income range
Figure 6.9 A pivot table template
A/W & Dr. Chen, Data Mining
Step 1, 2(bottom of p.198)
A/W & Dr. Chen, Data Mining
Step 3 (top) and steps 1,2 3 (p.199)
A/W & Dr. Chen, Data Mining
Step 4 (p.199)
A/W & Dr. Chen, Data Mining
Step 4 (p.199)
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Steps 1,2
A/W & Dr. Chen, Data Mining
Step 2
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Step 3 (p.200)
A/W & Dr. Chen, Data Mining
Step 3 - continued (p.200)
A/W & Dr. Chen, Data Mining
Step 3 - continued (p.200)
A/W & Dr. Chen, Data Mining
Step 3 - continued (p.200)
A/W & Dr. Chen, Data Mining
Step 3 - result (p.200)
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Figure 6.11 A pie chart for income
range
Pivot Tables for Hypothesis
Testing
Younger cardholders purchase credit card
insurance whereas more senior cardholders
do not.
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Figure 6.12 A pivot table showing
age and credit card insurance choice
Method 1
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Figure 6.13 Grouping the credit card
promotionn data by age
Method 2- Steps 1,2,3
A/W & Dr. Chen, Data Mining
Figure 6.14 PivotTable Layout
Wizard
Method 2- Step 4
A/W & Dr. Chen, Data Mining
Steps 4,5
A/W & Dr. Chen, Data Mining
Step 6
A/W & Dr. Chen, Data Mining
Step 7
A/W & Dr. Chen, Data Mining
Step 8
A/W & Dr. Chen, Data Mining
Result of Method 2
The average age for credit card insurance = no is
approximately 41.42, whereas the average age for
credit card insurance = yes is approximately 32.33
A/W & Dr. Chen, Data Mining
Creating a Multidimensional
Pivot Table
Investigate relationships between the
magazine, watch, and life insurance
promotions relative to customer gender
and income range.
A/W & Dr. Chen, Data Mining
Watch Promo
Watch Promo = No
Life Insurance Promo = Yes
Magazine Promo = Yes
No
Yes
Ye
s
Yes
No
No
Life Insurance Promo
A/W & Dr. Chen, Data Mining
Figure 6.15 A credit card promotion
cube
e
zin
a
g
o
Ma rom
P
Steps 1,2,3 (p. 206)
A/W & Dr. Chen, Data Mining
Steps 3 (after dragging life insurance promotion to DropData
Items Here. )
Continue dragging watch promotion and magazine promotion to DropData Items Here.
A/W & Dr. Chen, Data Mining
Step 3 (result)
A/W & Dr. Chen, Data Mining
Step 4
A/W & Dr. Chen, Data Mining
Decision Making – steps 1-3, p.207
A total of two customers took advantage of the life insurance and magazine promotions
but did not purchase the watch promotion.
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Figure 6.16 A pivot table with page
variables for credit card promotions
A/W & Dr. Chen, Data Mining
Result of p.207
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
A/W & Dr. Chen, Data Mining
Related documents