Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Seminar Prof. Dr. Andreas Meier Nicolas Werro Customer Data Warehouse Alain Wahl [email protected] University of Fribourg, Switzerland Department of Informatics MAY 2005 Abstract Big companies need to generate a huge volume of data, that needs to be converted into information that can be used for operational and analytical purposes. Ideally the data is stored in a data warehouse. A data warehouse is a large repository of historical, operational and customer data. Data volume can reach the size of several terabytes, i.e. 240 bytes of data. Home shopping companies, retailers and banks are users of data warehouses. Attached to a data warehouse is a set of analytic procedures for making sense out of the data. Contents Contents 2 List of Figures 3 1 Introduction 4 1.1 Why a Customer Data Warehouse? . . . . . . . . . . . . . . . . . . . 2 The Customer Relationship challenge 4 5 2.1 The first challenge of Customer Relationship Management . . . . . . 5 2.2 The second challenge of Customer Relationship Management . . . . . 5 3 Data Warehouse 7 3.1 Multi-dimensional databases . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 DBMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Definitions of a Data Warehouse . . . . . . . . . . . . . . . . . . . . . 7 3.4 Data Warehouse Architecture . . . . . . . . . . . . . . . . . . . . . . 8 3.5 Data Warehouse Components . . . . . . . . . . . . . . . . . . . . . . 9 3.6 Data Warehouse Operations . . . . . . . . . . . . . . . . . . . . . . . 10 4 Conclusion 12 Bibliography 13 List of Figures 2.1 Treatment of the customer data. [Meier]5 . . . . . . . . . . . . . . . . 6 3.1 A database management system. . . . . . . . . . . . . . . . . . . . . 8 3.2 Data warehouse architecture. [Silberschatz et al.]8 . . . . . . . . . . . 9 7 3.3 Data Cube. [Meier et al.] . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Star Schema. [Silberschatz et al.]8 . . . . . . . . . . . . . . . . . . . . 10 3.5 Hierarchies on dimensions. [Silberschatz et al.]8 . . . . . . . . . . . . 11 Chapter 1 Introduction 1.1 Why a Customer Data Warehouse? Most companies are unable to discover valuable information hidden in the data, which prevents them from transforming data to knowledge. Data warehouses are the ideal solution to collect all possible data from business processes. They are used to analyse data and draw conclusions. A customer data warehouse should answer the following questions for customer relationship management [Meier]5 : • What attributes describe an attractive customer or customer group? • What is the customer value in the past, present or future? • How loyal is a specific customer or customer group? • What changes in the customer requirements or service quality can be traced? • What are the preferred communictaion channels of a specific customer? Chapter 2 The Customer Relationship challenge 2.1 The first challenge of Customer Relationship Management The first challenge of CRM is to acquire new customers and lost customers with attractive market and resource potential [Kotler et al. 2002]4 . 2.2 The second challenge of Customer Relationship Management The second strategic strategic objective is to maintain and improve customer equity by cross- and up-selling together with retention programs during the customer lifetime [Blattberg et al. 2001]1 . As figure 2.1 shows, the manual treatment of customer data is still common. The use of a customer data warehouse with data mining techniques is rare [Han and Kamber 2001]3 . SECTION 2.2 THE SECOND CHALLENGE OF CUSTOMER RELATIONSHIP MANAGEMENT 6 Manual gathering of customer data No data collection except for fulfillment Survey of customer satisfaction Registration of campaign reactions Machine-aided collection of attributes Enrollment of customer behavior Permission marketing External market research Others 0 50 100 150 200 250 300 350 400 450 500 550 600 N = 969 Figure 2.1: Treatment of the customer data. [Meier]5 Chapter 3 Data Warehouse 3.1 Multi-dimensional databases Online analytical Processing (OLAP) is the action of data analysis. The core of OLAP is a multi-dimensional database. The differences between OLAP and OLTP (Online Transaction Processing) are shown in the following table: Data Operations Time Period Granularity Time of Response Availability 3.2 Online Transactional Processing productive databases read, write, delete current date detailed short (ms - s) high Online Analytical Processing databases for decision support read only historical, current, (future) different aggregation levels long (s - min) medium DBMS A database management system (cf. figure 3.1) is, like the word it suggests, composed of two components: a database and a management system. A multi-dimensional database management system is called a data warehouse. 3.3 Definitions of a Data Warehouse • “A Data Warehouse is an integrated collection of all transaction data that exist within a company to be used to consolidate and analyze daily business transactions.” [Karchur 2000] • “A Data Warehouse is a database, with tools, that stores current and historical data of potential interest to managers throughout the company.” [Laudon 1999] SECTION 3.4 DATA WAREHOUSE ARCHITECTURE Database DB ● ● MS ● + 8 Management System Entities & Relationships Query and Manipulation Language Specialized Functions (Recovery, Reorganization, Security, Data Protection) Figure 3.1: A database management system. • “A Data Warehouse is a subject oriented collection of information in support of management’s decisions.” [Schinzer 1999] • Inmon (1996) describes data warehouses by the following keywords: – subject-oriented: data is organized around subjects of the business (e.g. customers) rather than around applications (e.g. insurance) – integrated: it is consistent in the way data from several sources are treated (e.g. coding conventions are standardized) – time-variant: data are organized by various time periods (e.g. months) – non-volatile: the database is not uploaded in real time. 3.4 Data Warehouse Architecture Large companies have presences in many places, each of which may generate a large volume of data. For instance, manufacturing-problem data and customer-complaint data may be stored on different database systems. A data warehouse (cf. figure 3.2) is a repository of information gathered from different sources and stored under a unified schema. Figure 3.3 shows the core of each data warehouse: a data cube. In this example three dimensions are chosen, i.e. time, product and area. The schema of figure 3.4 is called a star schema. It is composed of a fact table, multiple dimension tables and foreign keys from the fact table to the dimension tables. SECTION 3.5 DATA WAREHOUSE COMPONENTS 9 Data source 1 Data loaders Data source 2 . . . DBMS Data warehouse Query and analysis tools Data source n Figure 3.2: Data warehouse architecture. [Silberschatz et al.]8 time Product area Figure 3.3: Data Cube. [Meier et al.]7 3.5 Data Warehouse Components There are two different possibilities when and how to gather data. In a source-driven architecture data sources transmit new information continually or periodically to the data warhouse. In a destination-driven architecture a data-warehouse periodically SECTION 3.6 DATA WAREHOUSE OPERATIONS item-info item-id itemname color size category date-info date month quarter year sales item-id store-id customer-id date number price 10 store store-id city state country customer customer-id name street city state zipcode country Figure 3.4: Star Schema. [Silberschatz et al.]8 sends requests for new data to sources. What schema to use? Data sources have been created independently and often have different schemas. They even may use different data models. The data warehouse has to convert the data to the integrated schema before they are stored. Data cleansing is the action of the data warehouse of correcting and preprocessing data. Data sources often deliver data with numerous minor inconsistencies. 3.6 Data Warehouse Operations • Drill Down and Roll Up is used to see more of less granularity. To make this actions possible, data need to bee stored hierarchically, like in figure 3.5. • Slicing is the action to select a part of the data cube, e.g. all products in one time period. • Dicing is called the action of re-order the main dimensions of a data cube. SECTION 3.6 DATA WAREHOUSE OPERATIONS Year Region Quarter Country Day of week Month State Hour of day Date City Date Time a) Time Hierarchy b) Location Hierarchy Figure 3.5: Hierarchies on dimensions. [Silberschatz et al.]8 11 Chapter 4 Conclusion Customer data warehouses are decision-support systems to help analyze and achieve online data collected by transaction processing systems, to help companies to make business decisions. To achieve this objective data warehouses are used to analyse historical data, for instance to predict trends. Warehouse schemas tend to be multidimensional, involving one or a a few very large fact tables and several much smaller dimension tables. Bibliography [1] R. C. Blattberg, G. Getz, J. S. Thomas (2001). Customer Equity Building and Managing Relationships as Valuable Assets. Harvard Business School Press: Boston. [2] Francis Buttle (2004). Customer Relationship Management. Concepts and Tools. [3] J. Han, M. Kamber (2001). Data Mining - Concept and Techniques. Morgan Kaufmann: San Francisco. [4] P. Kotler, C.J. Dipak, S. Maesincee (2002). Marketing Moves - A New Approach to Profits, Grothw, and Renewal. Harward Business School Press: Boston. [5] Andreas Meier. A Data Warehouse Approach to Customer Relationship Management. [6] Andreas Meier (2004). Relationale Datenbanken - Leitfaden für dir Praxis. Fünfte Auflage. [7] Andreas Meier, Stefan Hüsemann, Thomas Zurkinden (2004). Data Warehouse Architecture. [8] Silberschatz, Korth and Sudarshan (2002). Database. System Concepts. Fourth edition. [9] R. T. Watson (1999). Data Management: Databases and Organisations.