Download Customer Data Warehouse - Diuf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Seminar
Prof. Dr. Andreas Meier
Nicolas Werro
Customer Data Warehouse
Alain Wahl
[email protected]
University of Fribourg, Switzerland
Department of Informatics
MAY 2005
Abstract
Big companies need to generate a huge volume of data, that needs to be converted
into information that can be used for operational and analytical purposes. Ideally
the data is stored in a data warehouse. A data warehouse is a large repository of
historical, operational and customer data. Data volume can reach the size of several
terabytes, i.e. 240 bytes of data.
Home shopping companies, retailers and banks are users of data warehouses.
Attached to a data warehouse is a set of analytic procedures for making sense out
of the data.
Contents
Contents
2
List of Figures
3
1 Introduction
4
1.1
Why a Customer Data Warehouse? . . . . . . . . . . . . . . . . . . .
2 The Customer Relationship challenge
4
5
2.1
The first challenge of Customer Relationship Management . . . . . .
5
2.2
The second challenge of Customer Relationship Management . . . . .
5
3 Data Warehouse
7
3.1
Multi-dimensional databases . . . . . . . . . . . . . . . . . . . . . . .
7
3.2
DBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
3.3
Definitions of a Data Warehouse . . . . . . . . . . . . . . . . . . . . .
7
3.4
Data Warehouse Architecture . . . . . . . . . . . . . . . . . . . . . .
8
3.5
Data Warehouse Components . . . . . . . . . . . . . . . . . . . . . .
9
3.6
Data Warehouse Operations . . . . . . . . . . . . . . . . . . . . . . . 10
4 Conclusion
12
Bibliography
13
List of Figures
2.1
Treatment of the customer data. [Meier]5 . . . . . . . . . . . . . . . .
6
3.1
A database management system. . . . . . . . . . . . . . . . . . . . .
8
3.2
Data warehouse architecture. [Silberschatz et al.]8 . . . . . . . . . . .
9
7
3.3
Data Cube. [Meier et al.]
. . . . . . . . . . . . . . . . . . . . . . . .
9
3.4
Star Schema. [Silberschatz et al.]8 . . . . . . . . . . . . . . . . . . . . 10
3.5
Hierarchies on dimensions. [Silberschatz et al.]8 . . . . . . . . . . . . 11
Chapter 1
Introduction
1.1
Why a Customer Data Warehouse?
Most companies are unable to discover valuable information hidden in the data,
which prevents them from transforming data to knowledge. Data warehouses are
the ideal solution to collect all possible data from business processes. They are used
to analyse data and draw conclusions.
A customer data warehouse should answer the following questions for customer
relationship management [Meier]5 :
• What attributes describe an attractive customer or customer group?
• What is the customer value in the past, present or future?
• How loyal is a specific customer or customer group?
• What changes in the customer requirements or service quality can be traced?
• What are the preferred communictaion channels of a specific customer?
Chapter 2
The Customer Relationship
challenge
2.1
The first challenge of Customer Relationship
Management
The first challenge of CRM is to acquire new customers and lost customers with
attractive market and resource potential [Kotler et al. 2002]4 .
2.2
The second challenge of Customer Relationship Management
The second strategic strategic objective is to maintain and improve customer equity by cross- and up-selling together with retention programs during the customer
lifetime [Blattberg et al. 2001]1 .
As figure 2.1 shows, the manual treatment of customer data is still common.
The use of a customer data warehouse with data mining techniques is rare [Han and
Kamber 2001]3 .
SECTION 2.2 THE SECOND CHALLENGE OF CUSTOMER RELATIONSHIP
MANAGEMENT
6
Manual gathering of
customer data
No data collection
except for fulfillment
Survey of customer
satisfaction
Registration of
campaign reactions
Machine-aided collection of attributes
Enrollment of customer behavior
Permission marketing
External market
research
Others
0
50
100 150 200 250 300 350 400 450 500 550 600
N = 969
Figure 2.1: Treatment of the customer data. [Meier]5
Chapter 3
Data Warehouse
3.1
Multi-dimensional databases
Online analytical Processing (OLAP) is the action of data analysis. The core of
OLAP is a multi-dimensional database. The differences between OLAP and OLTP
(Online Transaction Processing) are shown in the following table:
Data
Operations
Time Period
Granularity
Time of Response
Availability
3.2
Online Transactional Processing
productive databases
read, write, delete
current date
detailed
short (ms - s)
high
Online Analytical Processing
databases for decision support
read only
historical, current, (future)
different aggregation levels
long (s - min)
medium
DBMS
A database management system (cf. figure 3.1) is, like the word it suggests, composed of two components: a database and a management system.
A multi-dimensional database management system is called a data warehouse.
3.3
Definitions of a Data Warehouse
• “A Data Warehouse is an integrated collection of all transaction data that
exist within a company to be used to consolidate and analyze daily business
transactions.” [Karchur 2000]
• “A Data Warehouse is a database, with tools, that stores current and historical
data of potential interest to managers throughout the company.” [Laudon
1999]
SECTION 3.4
DATA WAREHOUSE ARCHITECTURE
Database
DB
●
●
MS
●
+
8
Management
System
Entities & Relationships
Query and Manipulation Language
Specialized Functions (Recovery, Reorganization,
Security, Data Protection)
Figure 3.1: A database management system.
• “A Data Warehouse is a subject oriented collection of information in support
of management’s decisions.” [Schinzer 1999]
• Inmon (1996) describes data warehouses by the following keywords:
– subject-oriented: data is organized around subjects of the business
(e.g. customers) rather than around applications (e.g. insurance)
– integrated: it is consistent in the way data from several sources are
treated (e.g. coding conventions are standardized)
– time-variant: data are organized by various time periods (e.g. months)
– non-volatile: the database is not uploaded in real time.
3.4
Data Warehouse Architecture
Large companies have presences in many places, each of which may generate a large
volume of data. For instance, manufacturing-problem data and customer-complaint
data may be stored on different database systems. A data warehouse (cf. figure
3.2) is a repository of information gathered from different sources and stored under
a unified schema.
Figure 3.3 shows the core of each data warehouse: a data cube. In this example
three dimensions are chosen, i.e. time, product and area.
The schema of figure 3.4 is called a star schema. It is composed of a fact table,
multiple dimension tables and foreign keys from the fact table to the dimension
tables.
SECTION 3.5
DATA WAREHOUSE COMPONENTS
9
Data source 1
Data
loaders
Data source 2
.
.
.
DBMS
Data warehouse
Query and
analysis tools
Data source n
Figure 3.2: Data warehouse architecture. [Silberschatz et al.]8
time
Product
area
Figure 3.3: Data Cube. [Meier et al.]7
3.5
Data Warehouse Components
There are two different possibilities when and how to gather data. In a source-driven
architecture data sources transmit new information continually or periodically to the
data warhouse. In a destination-driven architecture a data-warehouse periodically
SECTION 3.6
DATA WAREHOUSE OPERATIONS
item-info
item-id
itemname
color
size
category
date-info
date
month
quarter
year
sales
item-id
store-id
customer-id
date
number
price
10
store
store-id
city
state
country
customer
customer-id
name
street
city
state
zipcode
country
Figure 3.4: Star Schema. [Silberschatz et al.]8
sends requests for new data to sources.
What schema to use? Data sources have been created independently and often have different schemas. They even may use different data models. The data
warehouse has to convert the data to the integrated schema before they are stored.
Data cleansing is the action of the data warehouse of correcting and preprocessing data. Data sources often deliver data with numerous minor inconsistencies.
3.6
Data Warehouse Operations
• Drill Down and Roll Up is used to see more of less granularity. To make this
actions possible, data need to bee stored hierarchically, like in figure 3.5.
• Slicing is the action to select a part of the data cube, e.g. all products in one
time period.
• Dicing is called the action of re-order the main dimensions of a data cube.
SECTION 3.6
DATA WAREHOUSE OPERATIONS
Year
Region
Quarter
Country
Day of week
Month
State
Hour of day
Date
City
Date Time
a) Time Hierarchy
b) Location Hierarchy
Figure 3.5: Hierarchies on dimensions. [Silberschatz et al.]8
11
Chapter 4
Conclusion
Customer data warehouses are decision-support systems to help analyze and achieve
online data collected by transaction processing systems, to help companies to make
business decisions. To achieve this objective data warehouses are used to analyse
historical data, for instance to predict trends. Warehouse schemas tend to be multidimensional, involving one or a a few very large fact tables and several much smaller
dimension tables.
Bibliography
[1] R. C. Blattberg, G. Getz, J. S. Thomas (2001). Customer Equity Building and Managing Relationships as Valuable Assets. Harvard Business
School Press: Boston.
[2] Francis Buttle (2004). Customer Relationship Management. Concepts and
Tools.
[3] J. Han, M. Kamber (2001). Data Mining - Concept and Techniques. Morgan
Kaufmann: San Francisco.
[4] P. Kotler, C.J. Dipak, S. Maesincee (2002). Marketing Moves - A New
Approach to Profits, Grothw, and Renewal. Harward Business School Press:
Boston.
[5] Andreas Meier. A Data Warehouse Approach to Customer Relationship Management.
[6] Andreas Meier (2004). Relationale Datenbanken - Leitfaden für dir Praxis.
Fünfte Auflage.
[7] Andreas Meier, Stefan Hüsemann, Thomas Zurkinden (2004). Data
Warehouse Architecture.
[8] Silberschatz, Korth and Sudarshan (2002). Database. System Concepts.
Fourth edition.
[9] R. T. Watson (1999). Data Management: Databases and Organisations.