* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data Warehousing Fundamentals
		                    
		                    
								Survey							
                            
		                
		                
                            
                            
								Document related concepts							
                        
                        
                    
						
						
							Transcript						
					
					Data Warehousing Fundamentals Volume 1 • Student Guide ....................................................................................... 50102GC20 Production 2.0 May 1999 M08761 Authors Copyright  Oracle Corporation, 1999. All rights reserved. Chon S. Chua This documentation contains proprietary information of Oracle Corporation. It is provided under a license agreement containing restrictions on use and disclosure and is also protected by copyright law. Reverse engineering of the software is prohibited. If this documentation is delivered to a U.S. Government Agency of the Department of Defense, then it is delivered with Restricted Rights and the following legend is applicable: Richard Green Technical Contributors and Reviewers Jackie Collins Restricted Rights Legend Jennifer Jacoby Use, duplication or disclosure by the Government is subject to restrictions for commercial computer software and shall be deemed to be Restricted Rights software under Federal law, as set forth in subparagraph (c) (1) (ii) of DFARS 252.227-7013, Rights in Technical Data and Computer Software (October 1988). Mike Schmitz John Haydu Russ Pitts Lauran Serhal Brian Pottle Donna Corrigan Patricia Moll Harry Penbert SuiWah Chan Joel Barkin Steve Dressler Publisher Tony McGettigan This material or any portion of it may not be copied in any form or by any means without the express prior written permission of Oracle Corporation. Any other copying is a violation of copyright law and may result in civil and/or criminal penalties. If this documentation is delivered to a U.S. Government Agency not within the Department of Defense, then it is delivered with “Restricted Rights,” as defined in FAR 52.227-14, Rights in Data-General, including Alternate III (June 1987). The information in this document is subject to change without notice. If you find any problems in the documentation, please report them in writing to Education Products, Oracle Corporation, 500 Oracle Parkway, Box SB-6, Redwood Shores, CA 94065. Oracle Corporation does not warrant that this document is error-free. Data Warehouse Method—A Methodology for Designing Data Warehouse, SQL*Loader, PL/SQL, Pro*C, Oracle7, Oracle8, and Oracle8i, Distributed Option, Parallel Query Option, Parallel Server Option, Media Server, Spatial Data Option, ConText Option, Video Server, Text Server, WebServer, Oracle Universal Server ROLAP Option, Express Server, Web-enabled Express Server, SQL*Net, Developer/2000, Relational Access Manager, Discoverer, Designer/2000, SQL*Bridge, Transparent Gateway Developer’s Kit, Procedural Gateway Developer’s Kit, Express, Express Analyzer, Express Objects, Sales Analyzer, and Financial Analyzer are product names, trademarks, or registered trademarks of Oracle Corporation. All other products or company names are used for identification purposes only and may be trademarks of their respective owners. Contents ..................................................................................................................................................... Preface Profile xi Related Publications xiv Typographic Conventions xv Lesson 1: Introduction Course Objectives 1-3 Agenda 1-5 Questions About You 1-9 Lesson 2: Meeting a Business Need Overview 2-3 Unsuitability of OLTP Systems for Complex Analysis 2-5 Management Information Systems and Decision Support 2-7 Data Extract Processing 2-9 Business Drivers for Data Warehouses 2-15 Current Situation and Growth of Data Warehousing 2-19 Typical Uses of a Data Warehouse 2-21 Summary 2-23 Practice 2-1 2-25 Lesson 3: Defining Data Warehouse Concepts and Terminology Overview 3-3 Data Warehouse Definition 3-5 Data Warehouse Properties 3-7 Data Warehouse Terminology 3-21 Components of a Data Warehouse 3-25 Oracle Warehouse Vision, Products, and Services 3-31 Summary 3-41 Practice 3-1 3-43 Lesson 4: Driving Implementation Through a Methodology Overview 4-3 Warehouse Development Approaches 4-5 The Need for an Iterative and Incremental Methodology 4-13 ..................................................................................................................................................... Data Warehousing Fundamentals iii Contents ..................................................................................................................................................... Oracle Data Warehouse Method 4-15 DWM Fundamental Elements 4-19 Oracle Warehouse Technology Initiative (WTI) 4-57 Summary 4-61 Practice 4-1 4-63 Lesson 5: Planning for a Successful Warehouse Overview 5-3 Managing Financial Issues 5-5 Obtaining Business Commitment 5-9 Managing a Warehouse Project 5-15 Identifying Planning Phases 5-29 Identifying Warehouse Strategy Phase Deliverables 5-31 Identifying Project Scope Phase Deliverables 5-35 Summary 5-41 Practice 5-1 5-43 Lesson 6: Analyzing User Query Needs Overview 6-3 Types of Users 6-5 Gathering User Requirements 6-7 Managing User Data Access 6-9 Security 6-21 OLAP 6-25 Query Access Architectures 6-47 Summary 6-51 Practice 6-1 6-53 Lesson 7: Modeling the Data Warehouse Overview 7-3 Data Warehouse Database Design Phases 7-5 Phase One: Defining the Business Model 7-7 Phase Two: Creating the Dimensional Model 7-17 Data Modeling Tools 7-39 ..................................................................................................................................................... iv Data Warehousing Fundamentals Contents ..................................................................................................................................................... Summary 7-41 Practice 7-1 7-43 Lesson 8: Choosing a Computing Architecture Overview 8-3 Architecture Requirements 8-5 The Hardware Architecture 8-7 Database Server Requirements 8-29 Parallel Processing 8-33 Summary 8-39 Practice 8-1 8-41 Lesson 9: Planning Warehouse Storage Overview 9-3 The Server Data Architecture 9-5 Protecting the Database 9-17 Summary 9-27 Practice 9-1 9-29 Lesson 10: Building the Warehouse Overview 10-3 Extracting, Transforming, and Transporting Data 10-5 Extracting Data 10-13 Examining Data Sources 10-15 Extraction Techniques 10-23 Extraction Tools 10-35 Summary 10-39 Practice 10-1 10-41 Lesson 11: Transforming Data Overview 11-3 Importance of Data Quality 11-5 Transformation 11-13 Transforming Data: Problems and Solutions 11-17 Transformation Techniques 11-33 ..................................................................................................................................................... Data Warehousing Fundamentals v Contents ..................................................................................................................................................... Transformation Tools Summary 11-57 Practice 11-1 11-59 11-53 Lesson 12: Transportation: Loading Warehouse Data Overview 12-3 Transporting Data into the Warehouse 12-5 Building the Transportation Process 12-11 Transporting the Data 12-15 Postprocessing of Loaded Data 12-25 Summary 12-39 Practice 12-1 12-41 Lesson 13: Transportation: Refreshing Warehouse Data Overview 13-3 Capturing Changed Data 13-5 Limitations of Methods for Applying Changes 13-25 Purging and Archiving Data 13-33 Final Tasks 13-39 Selecting ETT Tools 13-43 Summary 13-51 Practice 13-1 13-53 Lesson 14: Leaving a Metadata Trail Overview 14-3 Defining Warehouse Metadata 14-5 Developing a Metadata Strategy 14-11 Examining Types of Metadata 14-19 Metadata Management Tools 14-33 Common Warehouse Metadata 14-35 Summary 14-37 Practice 14-1 14-39 Lesson 15: Supporting End-User Access Overview 15-3 ..................................................................................................................................................... vi Data Warehousing Fundamentals Contents ..................................................................................................................................................... Business Intelligence 15-5 Multidimensional Query Techniques 15-7 Categories of Business Intelligence Tools 15-9 Data Mining in a Warehouse Environment 15-19 Oracle Data Mining Partners 15-33 Summary 15-35 Practice 15-1 15-37 Lesson 16: Web-Enabling the Warehouse Overview 16-3 Accessing the Warehouse Over the Web 16-5 Common Web Data Warehouse Architecture 16-9 Issues in Deploying a Data Warehouse on the Web 16-11 Evaluating Web-Based Tools 16-19 Summary 16-23 Practice 16-1 16-25 Lesson 17: Managing the Data Warehouse Overview 17-3 Managing the Transition to Production 17-5 Managing Growth 17-19 Managing Backup and Recovery 17-33 Identifying Data Warehouse Performance Issues Summary 17-51 17-45 Appendix A: Practice Solutions Practice 2-1 A-2 Practice 3-1 A-4 Practice 4-1 A-7 Practice 5-1 A-11 Practice 6-1 A-12 Practice 7-1 A-13 Practice 8-1 A-14 Practice 9-1 A-15 ..................................................................................................................................................... Data Warehousing Fundamentals vii Contents ..................................................................................................................................................... Practice 10-1 Practice 11-1 Practice 12-1 Practice 13-1 Practice 14-1 Practice 15-1 Practice 16-1 A-18 A-20 A-21 A-23 A-24 A-26 A-28 Glossary ..................................................................................................................................................... viii Data Warehousing Fundamentals Preface ................................. Profile ..................................................................................................................................................... Profile Before You Begin This Course This course is the entry-level course in the Data Warehousing curriculum. Therefore, there are no prerequisites to this course. Prerequisites There are no prerequisites for this course. How This Course Is Organized Data Warehousing Fundamentals is an instructor-led course featuring lecture and paper and pencil exercises as well as group discussions to reinforce the concepts and skills introduced. Lesson Lesson 1: Introduction Lesson 2: Meeting a Business Need Lesson 3: Defining Data Warehouse Concepts and Terminology Lesson 4: Driving Implementation Through a Methodology Lesson 5: Planning for a Successful Warehouse Aim In this lesson, the class format is reviewed, the class agenda is described, and students introduce themselves. Because this class is expected to appeal to a broad audience, the introduction will give the instructor an idea of the composition of the class in terms of data warehouse knowledge, Oracle knowledge, and the specific role that each student plays with regard to data warehousing. This lesson examines how data warehousing has evolved from early management information systems to today’s decision support systems. The primary motivating factors for data warehouse creation are explored. The types of industries employing data warehouse are considered. This lesson introduces the Oracle definition of a data warehouse. The lesson offers a general description of the properties of a data warehouse. The standard components and tools required to build, operate, and use a data warehouse are identified. This lesson introduces the Oracle Data Warehouse Method (DWM), a methodology employed by Oracle Consulting Services for incremental development of a total warehouse solution using a phased development approach. Partnering initiatives launched by Oracle are described. This lesson introduces the planning that is critical to the success of a data warehouse project. Planning phases, deliverables, and project roles are identified. Overall warehouse strategy and project scope are defined. ..................................................................................................................................................... Data Warehousing Fundamentals xi Preface ..................................................................................................................................................... Lesson Lesson 6: Analyzing User Query Needs Lesson 7: Modeling the Data Warehouse Lesson 8: Choosing a Computing Architecture Lesson 9: Planning Warehouse Storage Lesson 10: Building the Warehouse Lesson 11: Transforming Data Lesson 12: Transportation: Loading Warehouse Data Lesson 13: Transportation: Refreshing Warehouse Data Aim This lesson identifies the analysis required to identify and categorize users that may need to access data from the warehouse, and how their requirements differ. Data access and reporting tools are considered. This lesson examines the role of data modeling in a data warehousing environment. The lesson presents a very high level overview of warehouse modeling steps. You consider the different types of models that can be employed, such as the star schema. Tools available for warehouse modeling are introduced. This lesson examines the computer architectures that commonly support data warehouses. The benefits of each hardware architecture and reasons for using distributed warehouses are examined. Students examine the technology requirements of a database server for warehousing. This lesson examines the database setup and management issues such as partitioning, indexing, and ways to protect your database. In this lesson, you explore the sources of data for the data warehouse data. You consider how the extraction and transformation processes take data from source systems and change it into data that is acceptable to the users of the data warehouse. The lesson also describes typical data anomalies and looks at ways to eliminate them. In this lesson, you explore how the transformation process transforms data from source systems into data suitable for end user query and analysis applications. In this lesson, you examine how the extracted and transformed data is transported into the warehouse. In this lesson, you examine methods for updating the warehouse with changed data, after the first-time load. ..................................................................................................................................................... xii Data Warehousing Fundamentals Profile ..................................................................................................................................................... Lesson Lesson 14: Leaving a Metadata Trail Lesson 15: Supporting End-User Access Lesson 16: WebEnabling the Warehouse Lesson 17: Managing the Data Warehouse Aim This lesson focuses on the concept of warehouse metadata, and the role it plays in a well-developed and managed warehousing environment. This lesson investigates the ways that users may access the data in the data warehouse. Students are introduced to the concept of business intelligence. The lesson discusses the discovery model used by mining tools, and the reasons enterprises are looking at data mining solutions for discovery of information. This lesson discusses how to take advantage of the Web to deploy data warehouse information. It addresses internal and external access, as well as the advantages of Web-enabling a data warehouse. The lesson outlines the steps involved in deploying a Web-enabled data warehouse. Challenges in deploying a Webenabled data warehouse are also discussed. This lesson explores the management issues, critical success factors, and challenges to successful data warehouse implementation. The lesson addresses issues pertaining to the management of the entire warehouse life cycle. ..................................................................................................................................................... Data Warehousing Fundamentals xiii Preface ..................................................................................................................................................... Related Publications Oracle Publications Title Oracle8i for Data Warehousing: Fast and Simple for More Data and More Users (Nov 1998) Large Scale Data Warehousing with Oracle8i, Winter Corporation Sponsored Research Program URL http:// websight.us.oracle .com http:// websight.us.oracle .com DWM Handbook V1.0.0 Additional Publications • Oracle DBA Handbook, Loney, Kevin, Osborne McGraw-Hill; ISBN: 007882406. • Oracle: The Complete Reference, Koch, George and Kevin Loney; Oracle Press; ISBN: 007882396X. • The Data Warehouse Toolkit, Kimball, Ralph; John Wiley & Sons; ISBN: 0471153370. • Building the Data Warehouse, Inmon, W.; John Wiley & Sons; ISBN: 0471141615. • Oracle8 Data Warehousing, Dodge, Gary and Gorman, T.; John Wiley & Sons; ISBN: 0471199524. • The Data Warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing, and Deploying Data Warehouses, Kimball, Ralph and others; John Wiley & Sons, 1998; ISBN: 0471255475. • Data Warehouse Design Solutions, Adamson, C. and Venerable, M.; John Wiley & Sons, 1998; ISBN 0-471-25195-X. • Data Warehousing:Architecture and Implementation, Humphries, M. et. al., Prentice Hall PTR, 1999; ISBN: 0-13-080902-0. Web Sites • Data Warehouse Institute Web site, at http://www.dw-institute.com/ index.htm • The Data Warehouse Information Center Web site, at http:// pwp.starnetinc.com/larryg/index.html • The Data Warehouse.com Web site, at http://data-warehouse.com/ • The Data Warehouse Knowledge Center Web site, at http:// www.datawarehouse.org ..................................................................................................................................................... xiv Data Warehousing Fundamentals Typographic Conventions ..................................................................................................................................................... Typographic Conventions Typographic Conventions in Text Convention Bold italic Caps and lowercase Element Glossary term (if there is a glossary) Buttons, check boxes, triggers, windows Courier new, case sensitive (default is lowercase) Code output, directory names, filenames, passwords, pathnames, URLs, user input, usernames Initial cap Graphics labels (unless the term is a proper noun) Emphasized words and phrases, titles of books and courses, variables Italic Quotation marks Uppercase Interface elements with long names that have only initial caps; lesson and chapter titles in cross-references SQL column names, commands, functions, schemas, table names Example The algorithm inserts the new key. Click the Executable button. Select the Can’t Delete Card check box. Assign a When-Validate-Item trigger . . . Open the Master Schedule window. Code output: debug.seti(’I’,300); Directory: bin (DOS), $FMHOME (UNIX) Filename: Locate the init.ora file. Password: Use tiger as your password. Pathname: Open c:\my_docs\projects URL: Go to http://www.oracle.com User input: Enter 300 Username: Log on as scott Customer address (but Oracle Payables) Do not save changes to the database. For further information, see Oracle7 Server SQL Language Reference Manual. Enter [email protected], where user_id is the name of the user. Select “Include a reusable module component” and click Finish. This subject is covered in Unit II, Lesson 3, “Working with Objects.” Use the SELECT command to view information stored in the LAST_NAME column of the EMP table. ..................................................................................................................................................... Data Warehousing Fundamentals xv Preface ..................................................................................................................................................... Convention Arrow Brackets Commas Element Menu paths Key names Key sequences Plus signs Key combinations Example Select File—>Save. Press [Enter]. Press and release these keys one at a time: [Alt], [F], [D] Press and hold these keys simultaneously: [Ctrl]+[Alt]+[Del] Typographic Conventions in Code Convention Caps and lowercase Lowercase Element Oracle Forms triggers Column names, table names Passwords PL/SQL objects Lowercase italic Uppercase Syntax variables SQL commands and functions Example When-Validate-Item SELECT last_name FROM s_emp; DROP USER scott IDENTIFIED BY tiger; OG_ACTIVATE_LAYER (OG_GET_LAYER (’prod_pie_layer’)) CREATE ROLE role SELECT userid FROM emp; Typographic Conventions in Navigation Paths This course uses simplified navigation paths, such as the following example, to direct you through Oracle Applications. (N) Invoice—>Entry—>Invoice Batches Summary (M) Query—>Find (B) Approve This simplified path translates to the following: 1 (N) From the Navigator window, select Invoice—>Entry—>Invoice Batches Summary. 2 (M) From the menu bar, select Query—>Find. 3 (B) Click the Approve button. N = Navigator, M = Menu, B = Button ..................................................................................................................................................... xvi Data Warehousing Fundamentals 1 ................................. Introduction Lesson 1: Introduction ..................................................................................................................................................... Course Objectives After completing this course, you should be able to do the following: • Explain why data warehousing is a popular solution • • • • • Describe data warehousing terminology Identify components of an implementation Explain the important of employing a method Identify modeling concepts Identify the management and maintenance processes Copyright  Oracle Corporation, 1999. All rights reserved. ® Course Objectives • Identify the hardware platforms that can be employed with a data warehouse • • • Identify the features of the database server • Identify data warehouse implementation issues and challenges • Position the products for the Oracle warehouse Identify tools that can be employed at each stage Describe user profiles and techniques for querying the warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 1-2 Data Warehousing Fundamentals Course Objectives ..................................................................................................................................................... Course Objectives After completing this course, you should be able to the following: • Explain why data warehousing is a popular solution in today’s information technology environment • Describe the terminology used with data warehousing • Identify the standard components of a data warehouse implementation • Explain the importance of using a methodology for development, and specifically identify the phases of the Oracle Data Warehouse Method • Identify and use data warehouse modeling concepts • Identify the different processes required to manage and maintain the warehouse • Identify the hardware platforms that can be employed with a data warehouse • Identify the features required of a database server for a warehouse implementation • Identify the tools that can be used at each phase during the data warehouse development cycle • Describe user profiles and the techniques users may employ for querying the warehouse • Identify data warehousing implementation issues and challenges • Position the products for the Oracle warehouse ..................................................................................................................................................... Data Warehousing Fundamentals 1-3 Lesson 1: Introduction ..................................................................................................................................................... Data Warehousing Fundamentals Day 1 • • • Lesson 1 Introduction Lesson 2 Meeting a Business Need Lesson 3 Defining Data Warehouse Concepts and Terminology • Lesson 4 Driving Implementation Through a Methodology • • Lesson 5 Planning for a Successful Warehouse Lesson 6 Analyzing User Query Needs Copyright  Oracle Corporation, 1999. All rights reserved. ® Data Warehousing Fundamentals Day 2 • • • • • • Lesson 7 Modeling the Data Warehouse Lesson 8 Choosing a Computing Architecture Lesson 9 Planning Warehouse Storage Lesson 10 Building the Warehouse Lesson 11 Transforming Data Lesson 12 Transportation: Loading Warehouse Data Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 1-4 Data Warehousing Fundamentals Agenda ..................................................................................................................................................... Agenda Day 1 Lesson 1: Introduction Lesson 2: Meeting a Business Need Lesson 3: Defining Data Warehouse Concepts and Terminology Lesson 4: Driving Implementation Through a Methodology Lesson 5: Planning for a Successful Warehouse Lesson 6: Analyzing User Query Needs Day 2 Lesson 7: Modeling the Data Warehouse Lesson 8: Choosing a Computing Architecture Lesson 9: Planning Warehouse Storage Lesson 10: Building the Warehouse Lesson 11: Transforming Data Lesson 12: Transportation: Loading Warehouse Data ..................................................................................................................................................... Data Warehousing Fundamentals 1-5 Lesson 1: Introduction ..................................................................................................................................................... Data Warehousing Fundamentals Day 3 • Lesson 13 Transportation: Refreshing Warehouse Data • • • • Lesson 14 Leaving a Metadata Trail Lesson 15 Supporting End-User Access Lesson 16 Web-Enabling the Warehouse Lesson 17 Managing the Data Warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 1-6 Data Warehousing Fundamentals Agenda ..................................................................................................................................................... Day 3 Lesson 13: Transportation: Refreshing Warehouse Data Lesson 14: Leaving a Metadata Trail Lesson 15: Supporting End-User Access Lesson 16: Web-Enabling the Warehouse Lesson 17: Managing the Data Warehouse ..................................................................................................................................................... Data Warehousing Fundamentals 1-7 Lesson 1: Introduction ..................................................................................................................................................... Questions About You To tailor the class to your specific needs and to encourage dialog among all, please answer the following questions: • • • • What is your name and company? • What do you hope to get out of this class? What is your role in your organization? What is your level of Oracle expertise? Why are you building a data warehouse or data mart? Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 1-8 Data Warehousing Fundamentals Questions About You ..................................................................................................................................................... Questions About You You will get a lot more out of this class if you are aware of the background of your classmates and the issues that they face in the development of a data warehouse. Each student has a unique perspective and an experience and knowledge set from which we can learn. Because this class is expected to appeal to a broad audience, the introduction will give the instructor an idea of the composition of the class in terms of data warehouse knowledge, Oracle knowledge, and the specific role that each student plays with regard to data warehousing. ..................................................................................................................................................... Data Warehousing Fundamentals 1-9 Lesson 1: Introduction ..................................................................................................................................................... ..................................................................................................................................................... 1-10 Data Warehousing Fundamentals 2 ................................. Meeting a Business Need Lesson 2: Meeting a Business Need ..................................................................................................................................................... Overview Defining DW Concepts & Terminology Planning for a Successful Warehouse Meeting Meeting aa Business Business Need Need Choosing a Computing Architecture Planning Warehouse Storage Modeling the Data Warehouse ETT (Building the Warehouse) Analyzing User Query Needs Supporting End User Access Managing the Data Warehouse Project Management (Methodology, Maintaining Metadata) Copyright  Oracle Corporation, 1999. All rights reserved. ® Objectives After completing this lesson, you should be able to do the following: • Describe why an online transaction processing (OLTP) system is not suitable for complex analysis • Describe how extract processing for decision support querying led to data warehouse solutions employed today • Explain why businesses are driven to employ data warehouse technology • Identify some of the industries that employ data warehouses Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-2 Data Warehousing Fundamentals Overview ..................................................................................................................................................... Overview The top slide on the facing page is a road map representing the flow of the course. The vertical box entitled “Meeting a Business Need” emphasizes that the warehouse is business driven. The determination of the warehouse architecture, data model, and user query needs all stem from business requirements. The horizontal box running across the bottom represents the ongoing project management throughout the warehouse lifecycle. This lesson examines how data warehousing has evolved from early management information systems to today’s decision support systems. The primary motivating factors for data warehouse creation are explored. The types of industries employing data warehouse are considered. Objectives After completing this lesson, you should be able to do the following: • Describe why an online transaction processing (OLTP) system is not suitable for complex analysis • Describe how extract processing for decision support querying led to data warehouse solutions employed today • Explain why businesses are driven to employ data warehouse technology • Identify some of the industries that employ data warehouses ..................................................................................................................................................... Data Warehousing Fundamentals 2-3 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Characteristics of OLTP Systems Characteristic OLTP Typical operation Update Level of analytical requirements Low Screens Unchanging Amount of data per transaction Small Data level Detailed Age of data Current Orientation Records Copyright  Oracle Corporation, 1999. All rights reserved. ® Why OLTP Is Not Suitable for Complex Analysis OLTP Complex Analysis Information to support Historical information day-to-day service to analyze Data stored at transaction Data needs to be integrated level Database design: Normalized Database design: Denormalized, star schema Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-4 Data Warehousing Fundamentals Unsuitability of OLTP Systems for Complex Analysis ..................................................................................................................................................... Unsuitability of OLTP Systems for Complex Analysis Operational systems largely exist to support transactions, for example, the booking of an airline ticket. Decision support, which is a type of complex analysis, is very different from OLTP. Most OLTP transactions require a single record in a database to be located and updated or an addition of one or more new records. Even a simple decision support query such as “How many luxury cars did we sell in Boston for January 1999” requires very different operations at the database level to an OLTP transaction. A potentially large number of records must be located, and there are no update operations at all. Characteristics of OLTP Systems The characteristics of OLTP systems are described below. Characteristic Typical operation Level of analytical requirements Screens Amount of data per transaction Data level Age of data Orientation OLTP Update Low Unchanging Small Detailed Current Records Why OLTP Is Not Suitable for Complex Analysis OLTP databases are fully normalized and are designed to consistently store operational data, one transaction at a time. Complex analysis, on the other hand, requires database design that even business users find directly usable. To achieve this, a different database design techniques are required, for example the use of dimensional and star schemas with highly denormalized dimension tables. OLTP focuses on recording and completing different types of business transactions but is unable to provide decision makers with the information they need. The data needed for such complex analysis is scattered throughout different OLTP systems and must first be carefully integrated before the information needed can be obtained. Extracting the data from these OLTP systems demands so much of the system resources that the IT professional must wait until nonoperational hours before running the queries required to produce the report. Thus OLTP systems are not suitable for complex analysis because the database design is not optimized to run such queries. Additionally, OLTP systems do not have an integrated pool of data from all the operation systems within the enterprise in order for business users to derive complex analysis. Also, OLTP systems do not store historical data that is needed for complex analysis. ..................................................................................................................................................... Data Warehousing Fundamentals 2-5 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Management Information Systems and Decision Support Ad hoc access Production platforms Operational reports • • • • Decision makers MIS systems provided business data Reports were developed on request Reports provided little analysis capability Decision support tools gave personal ad hoc access to data Copyright  Oracle Corporation, 1999. All rights reserved. ® Analyzing Data from Operational Systems • • Data structures are complex • • • Data is not meaningfully represented Systems are designed for high performance and throughput Data is dispersed OLTP systems may be unsuitable for intensive queries Production platforms Operational reports Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-6 Data Warehousing Fundamentals Management Information Systems and Decision Support ..................................................................................................................................................... Management Information Systems and Decision Support Early Management Information Systems Early Management Information Systems (MIS) provided management with reports to assess the performance of the business. Report requirements were submitted as a request to the MIS development team, who developed the report and made it available to the user some time afterward—days, weeks, or even months later. The data in the reports was made available in a way that was difficult to use for analysis and forecasting. Personal Computing With the advent of personal computing and 4GL programming techniques, MIS became known as decision support (decision support systems or DSS). DSS was judged to support business users better, by giving them direct access to the operational data for additional ad hoc querying, which provided more flexible reporting as the information was needed. Analyzing Data from Operational Systems Although decision support tools are friendly, intuitive, and easy to use, often the structure of data in the online transaction processing systems does not support the user’s real analytical requirements. • The structure of the operational data is often complex and too highly structured (3NF). • The system was designed for high performance—high throughput online transaction processing—rather than CPU-intensive analysis of information. • The data is not always meaningfully presented to the end user query tool. • The same data elements may be defined differently for each operational system. For example, a customer record may hold the customer telephone number. In one system this number is stored as a 15-digit number, and on another as a 20 alphanumeric character value. • Data is dispersed on multiple and diverse systems, leading to data redundancy and the inability to coordinate data between systems to provide a global picture of the business. • Running online transaction processing and decision support concurrently on one machine degrades performance of the operational system, response time to users, and performance of networks. The overall impact on the operational system may be too great. ..................................................................................................................................................... Data Warehousing Fundamentals 2-7 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Data Extract Processing Operational systems Extracts Decision makers • End user computing offloaded from the operational environment • User’s own data Copyright  Oracle Corporation, 1999. All rights reserved. ® Management Issues Operational systems Extracts Decision makers Extract explosion Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-8 Data Warehousing Fundamentals Data Extract Processing ..................................................................................................................................................... Data Extract Processing DSS and Degradation The problem of performance degradation was partially solved by using extract processing techniques, which select data from one environment and transport it to another environment for user access (a data extract). Data Extract Program The data extract program searches through files and databases, gathering data according to specific criteria. The data is then placed into a separate set of files, which may reside on another environment, for use by analysts for decision support activities. Extract processing was a logical progression from decision support systems. It was seen as a way to move the data from the high-performance, high throughput online transaction processing systems onto client machines dedicated to analysis. Extract processing also gave the user ownership of the data. Management Issues with Data Extract Programs Although the principle of extracts appears logical, and to some degree represents a model similar to the way a data warehouse works, there are problems with processing extracts. Extract programs may become the source for other extracts, and extract management can become a full-time task for information systems departments. In some companies hundreds of extract programs are run at any time. ..................................................................................................................................................... Data Warehousing Fundamentals 2-9 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Productivity Issues • • • • Duplicated effort Multiple technologies Obsolete reports No metadata Copyright  Oracle Corporation, 1999. All rights reserved. ® Data Quality Issues • • • • • • • • • No common time basis Different calculation algorithms Different levels of extraction Different levels of granularity Different data field names Different data field meanings Missing information No data correction rules No drill-down capability Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-10 Data Warehousing Fundamentals Data Extract Processing ..................................................................................................................................................... Data Extract Program (continued) Productivity Issues with Extract Processing The productivity issues in an extract processing environment are listed below: • Extract effort is duplicated, because multiple extracts access the same data and use mainframe resources unnecessarily. • The program designed to access the extracted data must encompass all technologies employed by the source data. • A report cannot always be reused, because business structures change. • There is no common metadata providing a standard way of extracting, integrating, and using the data. Data Quality Issues with Extract Processing The data quality issues in an extract processing environment are listed below: • The data has no time basis and users cannot compare query results with confidence. The data extracts may have been taken at a different point-in-time. • Each data extract may use a different algorithm for calculating derived and computed values. This makes the data difficult to evaluate, compare, and communicate by managers who may not know the methods or algorithms used to create the data extract or reports. • Data extract programs may use different levels of extraction. • Access to external data may not be consistent, and the granularity of the external data may not be well defined. • Data sources may be difficult to identify, and data elements may be repeated on many extracts. • The data field names and values may have different meanings in the various systems in the enterprise (lack of semantic integrity). • There are no data correction rules to ensure that the extracted data is correct and clean. • The reports provide data rather than information, and no drill-down capability. ..................................................................................................................................................... Data Warehousing Fundamentals 2-11 Lesson 2: Meeting a Business Need ..................................................................................................................................................... From Extract to Warehouse DSS Internal and external systems • • • • Data warehouse Decision makers Controlled Reliable Quality information Single source of data Copyright  Oracle Corporation, 1999. All rights reserved. ® Advantages of Warehouse Processing Environment • • • • • • No duplication of effort No need for tools to support many technologies No disparity in data, meaning, or representation No time period conflict No algorithm confusion No drill-down restrictions Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-12 Data Warehousing Fundamentals Data Extract Processing ..................................................................................................................................................... Transitioning from Extract Processing Environment to Warehouse Processing Environment There was a transition from decision support using data extracts to decision support using the data warehouse. The data warehouse is a complete environment that requires skill, knowledge, and commitment to put together, particularly for the very large scale enterprise implementation. The data warehouse environment is more controlled and therefore more reliable for decision support than an extract environment. The data warehouse environment supports your entire decision support requirements by providing high-quality information, made available by accurate and effective cleansing routines and using consistent and valid data transformation rules and documented presummarization of data values. It contains one single source of accurate, reliable information that can be used for analysis. Advantages of the Warehouse Processing Environment over the Extract Processing Environment The advantages of the warehousing processing environment are listed below: • No duplication of effort • No need to consider using a query and reporting tool that supports more than one technology • No disparity with the data and its meaning • No disparity with the way data is represented • No conflict over the time periods employed • No contention over the algorithms that have been used • No restriction on drill-down capabilities ..................................................................................................................................................... Data Warehousing Fundamentals 2-13 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Business Motivators • • • • • • • • • Know the business Reinvent to face new challenges Invest in products Invest in customers Retain customers Invest in technology Improve access to business information Be profitable Provide superior services and products Copyright  Oracle Corporation, 1999. All rights reserved. ® Business Motivators • • Provide supporting information systems Get quality information – Reduce costs – Streamline the business – Improve margins Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-14 Data Warehousing Fundamentals Business Drivers for Data Warehouses ..................................................................................................................................................... Business Drivers for Data Warehouses Businesses in the nineties face challenges such as regulatory control, competition, market maturity, product differentiation, customer behavior, and accelerated product life cycles, all of which require businesses to develop market awareness, responsiveness, adaptability, innovation, efficiency, and quality. Critical Success Factors for a Dynamic Business Environment In order to succeed in an ever-changing business environment a company must: • Know both the market they are in and their business (internally and externally). • Reinvent themselves to face new challenges. This may be changing product requirements, diverse and effective services, or even changes in internal organizational structures. • Invest in research and development of new product channels. • Invest in high-value customers who contribute greater returns to the business. • Retain existing customers and attract new customers. • Invest in new technology to support business needs. • Improve access to information so that they can make rapid decisions, based on an accurate picture of the business. • Be profitable. At the same time, they must be able to invest in resources for the future, such as technology and people. • Provide superior services and products to keep market share and maintain income. Information Needed to Ensure Success To support these strategies, a business needs to have: • Access to consistent and high-quality information on the behaviors of the business and the external markets, so that they can constantly monitor the state of the business. • Information that can help to reduce costs, streamline the business, and improve margins. ..................................................................................................................................................... Data Warehousing Fundamentals 2-15 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Technological Advances • Parallelism – Hardware – Operating system – Database – Query – Index – Applications 8i • • • • Large databases • • Robust warehouse tools 64-bit architectures Indexing techniques Affordable, cost-effective open systems Sophisticated end user tools Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-16 Data Warehousing Fundamentals Business Drivers for Data Warehouses ..................................................................................................................................................... Technology Needed to Support the Business Needs Today’s information technology climate provides you with cost-effective computing resources in the hardware and software arena, Internet and intranet solutions, and databases that can hold very large volumes of data for analysis, using a multitude of data access technologies. Technological Advances Enabling Data Warehousing Technology (specifically open systems technology) is making it affordable to analyze vast amounts of data, and hardware solutions are now more cost-effective. Parallelism Recent advances in parallelism have benefited all aspects of computing: • Hardware environment • Operating system environment • Database management systems and all associated database operations • Query techniques • Indexing strategies • Applications Other Factors • Very large volumes of data can be managed for warehouses greater than one terabyte in size. • Recently introduced 64-bit architectures are increasing server capacity and speed. • Improved indexing techniques (bitmap index, hash index, star join) provide rapid access to data. • Warehouse tools are becoming more robust and less expensive. • Licensing strategies are more effective and affordable. • Open systems are available. • Sophisticated, user-friendly, and intuitive tools are available to the user community for all types of data warehouse access. ..................................................................................................................................................... Data Warehousing Fundamentals 2-17 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Current Situation and Growth 60 25 50 20 40 15 30 10 20 5 10 0 0 1996 2001 USA Europe APAC Other Revenue Projected Growth USA Europe APAC Other Current Revenue Copyright  Oracle Corporation, 1999. All rights reserved. ® Growth Motivators and Inhibitors • • • • • Successful implementations • • • • Year 2000 compliance Decreased risk Robust extraction software Improving price to performance ratios Improved staff training Skills shortage Lack of integrated metadata Data cleaning cost Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-18 Data Warehousing Fundamentals Current Situation and Growth of Data Warehousing ..................................................................................................................................................... Current Situation and Growth of Data Warehousing Data warehouses are becoming increasingly popular. The statistics for the estimated growth of data warehousing are compelling. These figures are not specific to Oracle but are industry wide. Revenues A recent report has shown that in 1996 data warehouse revenues (which include hardware, software, and people-provided services) netted $8 billion (US). It is forecast that in 2001 this figure will rise to $23 billion (U.S.), assuming a compound annual growth rate of around 20% per year. Geography Most data warehouse implementations exist in the U.S., with Europe following close behind, and then Asia Pacific. Growth Motivators These include: • Increased successful implementations • Decreased risk with vendors supplying a total solution • More robust and functional extraction software • Improved (and improving) price-to-performance equipment ratios • Improved training for IT staff Growth Inhibitors These may include: • Year 2000 compliance • Shortage of skills in specific areas of data warehousing • The lack of integrated metadata components • The labor-intensive commitment to the data cleaning function and its corresponding dollar and time cost Enterprisewide Implementations and Data Marts Enterprise data warehouses are in position to dominate the business, compared with the smaller data mart implementations that are specific to departments or specific functional requirements. ..................................................................................................................................................... Data Warehousing Fundamentals 2-19 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Typical Uses of a Data Warehouse Financial Manufacturing Telecom Retail Others 0 10 20 30 40 Percentage Market Coverage • • • • • Airline Banking Health care Investment Insurance • • • • • Retail Telecommunications Manufacturing Credit card suppliers Clothing distributors Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-20 Data Warehousing Fundamentals Typical Uses of a Data Warehouse ..................................................................................................................................................... Typical Uses of a Data Warehouse The requirements of a business can be met by employing a data warehouse solution, which collects data from internal business operations and external data from outside organizations to provide a single source of reliable data for analysis. Typical Users of a Data Warehouse There are many industries that employ data warehouses: • Airlines for aircraft deployment, analysis of route profitability, frequent flyer promotions, and maintenance • Banking for trend analysis, promotion of products and services, and customer service • Health care for analysis and cost reduction • Investment and insurance companies for planning, customer analysis, risk assessment, and portfolio management • Retail stores for trend analysis, buying pattern analysis, promotions, customer profiling, and pricing • Telecommunications for analysis and for product and service promotions Other industries that currently use data warehouse solutions are manufacturers, credit card issuers, and clothing distributors Figures show that the highest proportion of revenues in data warehousing is spent by the financial services, retail, telecommunications, and manufacturing industries ..................................................................................................................................................... Data Warehousing Fundamentals 2-21 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Summary This lesson covered the following topics: • Describing why an online transaction processing (OLTP) system is not suitable for complex analysis • Describing how extracting processing for decision support querying led to data warehouse solutions employed today • Explaining why businesses are driven to employ data warehouse technology • Identifying some of the industries that employ data warehouses Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-22 Data Warehousing Fundamentals Summary ..................................................................................................................................................... Summary This lesson covered the following topics: • Describing why an online transaction processing (OLTP) system is not suitable for complex analysis • Describing how extracting processing for decision support querying led to data warehouse solutions employed today • Explaining why businesses are driven to employ data warehouse technology • Identifying some of the industries that employ data warehouses ..................................................................................................................................................... Data Warehousing Fundamentals 2-23 Lesson 2: Meeting a Business Need ..................................................................................................................................................... Practice 2-1 Overview The practice covers answering questions and discussing how data warehousing meets business needs Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 2-24 Data Warehousing Fundamentals Practice 2-1 ..................................................................................................................................................... Practice 2-1 1 OLTP databases hold up-to-the-minute information and are most commonly 2 3 4 5 designed as read-only databases. True False In the scenario below, state whether it refers to an operational system or an analytical processing system. “Show me how a specific brand of printer is selling throughout different parts of the United States and how this specific brand of printer is selling since it was first introduced into my stores.” This scenario refers to: a An operational system b An analytical processing system Who is the target audience for the data warehouse? a The business community in the organization b IT professionals c Data-entry clerks d None of the above e All of the above Are the following statements true or false? a Operational systems display the following qualities: Good performance _____ Static data contents _____ High availability _____ Unpredictable CPU use _____ b Identify the reasons why business analysis is not easy with operational systems. Data is not structured for drill-down capablity. _____ The system is not designed for querying. _____ Data analysis can be CPU-intensive. _____ Data is not integrated between systems. _____ In groups of three or four, discuss the questions below and present your points to the class at the end of the discussion. a List some of the reasons that your company is considering implementing a data warehouse or data mart. ..................................................................................................................................................... Data Warehousing Fundamentals 2-25 Lesson 2: Meeting a Business Need ..................................................................................................................................................... What are some of the business problems that your company is trying to answer? c Why is the business community in your organization unable to find the answers to their business questions based on the existing information systems? b ..................................................................................................................................................... 2-26 Data Warehousing Fundamentals 3 ................................. Defining Data Warehouse Concepts and Terminology Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Overview Defining Defining DW DW Concepts Concepts & & Terminology Terminology Planning for a Successful Warehouse Meeting a Business Need Choosing a Computing Architecture Planning Warehouse Storage Modeling the Data Warehouse ETT (Building the Warehouse) Analyzing User Query Needs Supporting End User Access Managing the Data Warehouse Project Management (Methodology, Maintaining Metadata) Copyright  Oracle Corporation, 1999. All rights reserved. ® Objectives After completing this lesson, you should be able to do the following: • • • • • Identify a common, broadly accepted definition of a data warehouse Recognize some of the operational properties of a data warehouse Recognize common data warehousing terminology Identify the functionality associated with each component required for a successful data warehouse implementation Identify and position the Oracle Warehouse vision, products, and services Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-2 Data Warehousing Fundamentals Overview ..................................................................................................................................................... Overview The previous lesson covered how data warehousing has evolved from early management information systems to today’s decision support systems that meets a business need. This lesson defines data warehouse concepts and terminology. Note that the “Defining Data Warehouse Concepts and Terminology” block is highlighted in the course road map on the facing page. Specifically, this lesson introduces the Oracle definition of a data warehouse. The lesson offers a general description of the properties of a data warehouse. The standard components and tools required to build, operate, and use a data warehouse are identified. Objectives After completing this lesson, you should be able to do the following: • Identify a common, broadly accepted definition of a data warehouse • Recognize some of the operational properties of a data warehouse • Recognize common data warehousing terminology • Identify the functionality associated with each component required for a successful data warehouse implementation • Identify and position the Oracle Warehouse vision, products, and services ..................................................................................................................................................... Data Warehousing Fundamentals 3-3 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Definition of a Data Warehouse “ An enterprise structured repository of subjectoriented, time-variant, historical data used for information retrieval and decision support. The data warehouse stores atomic and summary data.” Oracle Data Warehouse Method Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-4 Data Warehousing Fundamentals Data Warehouse Definition ..................................................................................................................................................... Data Warehouse Definition This definition of a data warehouse from the Oracle Data Warehouse Method describes many of the most significant characteristics of a data warehouse. The Oracle Data Warehouse Method was developed using experiences gained from successful data warehouse projects carried out by Oracle Consulting Services. This method is discussed in Lesson 4. Subject-Oriented While the data in an OLTP system is stored to support a specific business process (for example, order entry, campaign management, and so on) as efficiently as possible, data in a data warehouse is stored based on common subject areas (for example, customer, product, and so on) for ease of access. That is because the complete set of questions to be posed to a data warehouse are never known. Every question the data warehouse answers spawns new questions. Thus, the focus of the design of a data warehouse is providing users easy access to the data so that current and future questions can be answered. Time-Variant The data warehouse contains slices of data across different periods of time. With these data slices, the user can view reports from now and in the past. Historical A data warehouse typically contains several years worth of data. This is necessary to support trending, forecasting, and time-based performance reporting (for example, current year versus previous year). Information Retrieval and Decision Support A data warehouse is a facility for getting at information to answer questions. It is not meant for direct data entry; batch updates are the norm for refreshing data warehouses. Atomic and Summary Data Depending on the purpose of the data warehouse, it may contain atomic data, summary data, or both. ..................................................................................................................................................... Data Warehousing Fundamentals 3-5 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Data Warehouse Properties Subject Oriented Integrated Data Warehouse Non Volatile Time Variant Copyright  Oracle Corporation, 1999. All rights reserved. ® Subject-Oriented Data is categorized and stored by business subject rather than by application. OLTP Applications Equity Plans Shares Insurance Loans Data Warehouse Subject Customer financial information Savings Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-6 Data Warehousing Fundamentals Data Warehouse Properties ..................................................................................................................................................... Data Warehouse Properties Bill Inmon defines data warehousing as: “A Data Warehouse is a subject oriented, integrated, time variant, non volatile collection of data in support of management’s decision making process.” Subject-Oriented Subject-oriented data is organized around major subject areas of an enterprise, and is useful for an enterprise-wide understanding of those subjects. For example, a banking operational system keeps independent records of customer savings, loans, and other transactions. A warehouse pulls this independent data together to provide financial information. You can access subject-oriented data related to any major subject area of an enterprise: • Customer financial information • Toll calls made in the telecommunications industry • Airline passenger booking information • Insurance claim data The data is transformed so that it is consistent and meaningful for the warehouse. ..................................................................................................................................................... Data Warehousing Fundamentals 3-7 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Integrated Data on a given subject is defined and stored once. Savings Current accounts Loans OLTP Applications Customer Data Warehouse ® Copyright  Oracle Corporation, 1999. All rights reserved. Time-Variant Data is stored as a series of snapshots, each representing a period of time. 1997 1997 1997 Time Data 01/97 January 02/97 February 03/97 March Data Warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-8 Data Warehousing Fundamentals Data Warehouse Properties ..................................................................................................................................................... Integrated In many organizations, data resides in diverse independent systems, making it difficult to integrate into one set of meaningful information for analysis. A key characteristic of a warehouse is that data is completely integrated. Data is stored in a globally acceptable manner, even when the underlying source data is stored differently. The transformation and integration process can be time-consuming and costly. It requires commitment from every part of the organization, particularly top-level managers who make the decisions and allocate resources and funds. Data Consistency You must deal with data inconsistencies and anomalies before the data is loaded into the warehouse. Consistency is applied to naming conventions, measurements, encoding structures, and physical attributes of the data. Data Redundancy Data redundancy at the detail level in the warehouse environment is eliminated; the warehouse only contains data that is physically selected and moved into it; however, selective and deliberate redundancy in the form of aggregates and summaries is required in the warehouse to improve the performance of queries especially drill-down analysis. Time-Variant Warehouse data is by nature historical; it does not usually contain the current transactional data. Data is represented over a long time horizon, from two to ten years, compared with one to three months of data for a typical operational system. The data allows for analysis of past and present trends, and for forecasting using “what-if” scenarios. Time Element The data warehouse always contains a key element of time, such as quarter, month, week, or day, that determines when the data was loaded. The date may be a single snapshot date, such as 10-JAN-97, or a range, such as 01-JAN-97 to 31-JAN-97. Snapshots by Time Period Warehouse data is essentially a series of snapshots by time periods that do not change. Special Dates A time dimension usually contains all the dates required for analysis, including special dates like holidays and events. ..................................................................................................................................................... Data Warehousing Fundamentals 3-9 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Nonvolatile Typically data in the data warehouse is not updated or deleted. Operational Warehouse Load Insert Update Delete Read Read Copyright  Oracle Corporation, 1999. All rights reserved. ® Changing Data First time load Operational Databases Warehouse Database Refresh Refresh Purge or Archive Refresh Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-10 Data Warehousing Fundamentals Data Warehouse Properties ..................................................................................................................................................... Nonvolatile Typically, data in the data warehouse is read-only. Data is loaded into the data warehouse for the first-time load, and then refreshed regularly. Warehouse data is accessed by the business users. Warehouse operations typically involve: • Loading the initial set of warehouse data (often called the first-time load) • Refreshing the data regularly (called the refresh cycle) Accessing the Data Once a snapshot of data is loaded into the warehouse, it rarely changes. Therefore, data manipulation is not a consideration at the physical design level. The physical warehouse is optimized for data retrieval and analysis. Refresh Cycle The data in the warehouse is refreshed; that is, snapshots are added. The refresh cycle is determined by the business users. A refresh cycle need not be the same as the grain (level at which the data is stored) of the data for that cycle. For example, you may choose to refresh the warehouse weekly, but the grain of the data may be daily. Changing Warehouse Data The following operations are typical of a data warehouse: • The initial set of data is loaded into the warehouse, often called the first-time load. This is the data by which you will measure the business, and the data containing the criteria by which you will analyze the business. • Frequent snapshots of core data warehouse data are added, (more occurrences), according to the refresh cycle and using data from the multiple source systems. Warehouse data may need to be changed in other ways: • The data you are using to analyze the business may change, the data warehouse must be kept up-to-date to keep it accurate. • The business determines how much historical data is needed for analysis, say five years worth. Older data is either archived or purged. • Inappropriate or inaccurate data values may be deleted from or migrated out of the data warehouse. ..................................................................................................................................................... Data Warehousing Fundamentals 3-11 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Data Warehouse Versus OLTP Property Operational Data Warehouse Response Sub seconds to Seconds to hours Time seconds Operations DML Primarily read only Nature of Data 30-60 days Snapshots over time Data Organization Application Subject, time Size Small to large Large to very large Data Sources Operational, Internal Operational, Internal, External Activities Processes Analysis Copyright  Oracle Corporation, 1999. All rights reserved. ® Usage Curves • • Operational system is predictable Data warehouse – Variable – Random Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-12 Data Warehousing Fundamentals Data Warehouse Properties ..................................................................................................................................................... Data Warehouse Versus Online Transaction Processing (OLTP) Response Time and Data Operations Data warehouses are constructed for very different reasons than online transactional processing (OLTP) systems. OLTP systems are optimized for getting data in—for storing data as a transaction occurs. Data warehouses are optimized for getting data out—for providing quick response for analysis purposes. Since there tends to be a high volume of activity in the OLTP environment, rapid response is critical; whereas, data warehouse applications are analytical rather than operational. Therefore slower performance is acceptable. Nature of Data The data stored in each database varies in nature: the data warehouse contains snapshots of data over time to support time-series analysis whereas, the OLTP system stores very detailed data for a short time such as 30 to 60 days. Data Organization The data warehouse is subject specific and supports analysis so data is arranged accordingly. In order for the OLTP system to support subsecond response, the data must be arranged to optimize the application. For example, an order entry system may have tables which hold each of the elements of the order whereas a data warehouse may hold the same data but arrange it by subject such as customer, product, and so on. Data Sources Since the data warehouse is created to support analytical activities, data from a variety of sources can be integrated. The operational data store of the OLTP system holds only internal data or data necessary to capture the operation or transaction. Usage Curves Operational systems and data warehouses have different usage curves. An operational system has a more predictable usage curve, the warehouse a less predictable, more varied, and random usage curve. Access to the warehouse varies not just on a daily basis, but may even be affected by forces such as a seasonal variations. For this reason, you cannot expect the operational system to handle heavy analytical queries (DSS) and continue to give good transaction rates for the minute-by-minute processing required. ..................................................................................................................................................... Data Warehousing Fundamentals 3-13 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... User Expectations • • • • • Control expectations Set achievable targets for query response Set SLAs Educate Growth and use is exponential Copyright  Oracle Corporation, 1999. All rights reserved. ® Enterprisewide Warehouse • • • • • • • Large scale implementation Scopes the entire business Data from all subject areas Developed incrementally Single source of enterprisewide data Synchronized enterprisewide data Single distribution point to dependent data marts Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-14 Data Warehousing Fundamentals Data Warehouse Properties ..................................................................................................................................................... User Expectations The difference in response time may be significant between a data warehouse and a client-server environment fronted by personal computers. You must control the user’s expectations regarding response. Set reasonable and achievable targets for query response time, which can be assessed and proved in the first increment of development. You can then define, specify, and agree on Service Level Agreements. If users are accustomed to fast PC-based systems, they may find the warehouse excessively slow. However, it is up to those educating the users to ensure that they are aware of just how big the warehouse is, how much data is there, and of what the benefit the information is both user and business. Exponential Growth and Use Once implemented, data warehouses continue to grow in size. Each time the warehouse is refreshed more data is added, deleted, or archived. The refresh happens on a regular cycle. Successful data warehouses grow very quickly, perhaps to a magnitude of gigabytes a month and terabytes over time. Once the success of the warehouse is proven, the use increases dramatically. Users who may have been skeptical want access. Use often grows faster than expected. Enterprisewide Data Warehouse To summarize, an enterprisewide warehouse stores data from all subject areas within the business for analysis by end users. The scope of the warehouse is the entire business and all operational aspects within the business. An enterprisewide warehouse is normally (and should be) created through a series of incrementally developed solutions. Never create an enterprisewide data warehouse under one project umbrella, it will not work. With an enterprisewide data warehouse all users access the warehouse, which provides: • A single source of corporate enterprisewide data. • A single source of synchronized data in the enterprisewide warehouse for each subject area. • A single point for distribution of data to dependent data marts. ..................................................................................................................................................... Data Warehousing Fundamentals 3-15 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Data Warehouses Versus Data Marts Data Warehouse Data Mart Property Data Warehouse Data Mart Scope Enterprise Department Subjects Multiple Single-subject, LOB Data Source Many Few Size (typical) 100 GB to > 1 TB < 100 GB Implementation time Months to years Months Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-16 Data Warehousing Fundamentals Data Warehouse Properties ..................................................................................................................................................... Data Warehouse Versus Data Mart Definition Data mart is a subset of data warehouse fact and summary data that provides users with information specific to their requirements. Scope A data warehouse deals with multiple subject areas and is typically implemented and controlled by a central organizational unit such as the Corporate Information Technology group. It is often called a central or enterprise data warehouse. Subjects A data mart is a simpler form of a data warehouse designed for a single line of business (LOB) or functional area such as sales, finance, or marketing. Data Source A data warehouse typically assembles data from multiple source systems. A data mart typically assembles data from fewer sources. Size Data marts are not differentiated from a data warehouses based on size, but on use and management. Implementation Time Data marts are typically smaller and less complex than data warehouses and therefore are typically easier to build and maintain. A data mart can be built as a “proof of concept” step toward the creation of an enterprisewide warehouse. ..................................................................................................................................................... Data Warehousing Fundamentals 3-17 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Dependent Data Mart Flat Files Operational Systems Marketing Marketing Sales Finance Human Resources Sales Data Warehouse Finance Data Marts External Data Copyright  Oracle Corporation, 1999. All rights reserved. ® Independent Data Mart Operational Systems Flat Files Sales or Marketing External Data Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-18 Data Warehousing Fundamentals Data Warehouse Properties ..................................................................................................................................................... Dependent and Independent Data Marts Data marts can be categorized into two types: dependent and independent. The categorization is based primarily on the data source that feeds the data mart. Dependent Data Mart Dependent data marts have the following characteristics: • The source is the warehouse. Dependent data marts rely on the data warehouse for content. • The extraction, transformation, and transportation (ETT) process is easy. Dependent data marts draw data from a central data warehouse that has already been created. Thus, the main effort in building a mart, the data cleansing and extraction, has already been performed. The dependent data mart simply requires data to be moved from one database to another. • The data mart is part of the enterprise plan. Dependent data marts are usually built to achieve improved performance and availability, better control, and lower telecommunication costs resulting from local access to data relevant to a specific department. Independent Data Mart Independent data marts are stand-alone systems built from scratch that draw data directly from operational and/or external sources of data. Independent data marts have the following characteristics: • The sources are operational systems and external sources. • The ETT process is difficult. Because independent data marts draw data from unclean or inconsistent data sources, efforts are directed toward error processing and integration of data. • The data mart is built to satisfy analytical needs. The creation of independent data marts is often driven by the need for a quick solution to analysis demands. ..................................................................................................................................................... Data Warehousing Fundamentals 3-19 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Data Warehouse Terminology • Operational data store (ODS) Stores tactical data from production systems that are subject-oriented and integrated to address operational needs • Metadata Metadata Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-20 Data Warehousing Fundamentals Data Warehouse Terminology ..................................................................................................................................................... Data Warehouse Terminology Operational Data Store The operational data store (ODS) stores tactical data from production systems that are subject-oriented and integrated to address operational needs. The detailed, current information in the ODS is transactional in nature, updated frequently (at least daily), and is only held for a short period of time. The objectives of the ODS are to: • Integrate information from the production systems, • Relieve the production systems of reporting and analysis demands, and • Provide access to current data In addition, the ODS can be a data source for the data warehouse and may be accessed with the same tools used to access the data warehouse and data marts. The goal is to provide a tactically-structured, efficient information processing environment to satisfy analysis and reporting capabilities required for the day-to-day operations of the business. Metadata Information about data, derived directly from the business owners and users, is maintained to support operations and use of the data warehouse. ..................................................................................................................................................... Data Warehousing Fundamentals 3-21 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Data Warehouse Terminology Enterprise data warehouse Architecture Business area warehouse Data integration Source data Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-22 Data Warehousing Fundamentals Data Warehouse Terminology ..................................................................................................................................................... Architecture A set of rules or structures providing a framework for the overall design of a system or product. Technical Infrastructure The technologies, platforms, databases, gateways, and other components necessary to make the architecture functional within the corporation. Data Access Environment An environment that includes the front-end data-access tools and technologies, training on how to use these tools and technologies, the implementation of metadata, and the training to navigate through the metadata. ..................................................................................................................................................... Data Warehousing Fundamentals 3-23 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Methodology • • • Ensures a successful data warehouse Encourages incremental development Provides a staged approach to an enterprisewide warehouse – Safe – Manageable – Proven – Recommended Copyright  Oracle Corporation, 1999. All rights reserved. ® Modeling • Warehouses differ from operational structures: – Analytical requirements – Subject orientation • Data must map to subject oriented information: – Identify business subjects – Define relationships between subjects – Name the attributes of each subject • • Modeling is iterative Modeling tools are available Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-24 Data Warehousing Fundamentals Components of a Data Warehouse ..................................................................................................................................................... Components of a Data Warehouse Although every warehouse implementation varies, for every data warehouse there are: • Implementation methodologies • Design and modeling considerations • Operational and management processes to be developed • Data management considerations • User access reporting requirements and tools to be chosen These are components and requirements that remain constant within any warehouse development and production environment. Methodology Employing a methodology for the development of any system is always important. In a warehouse environment even more so. The warehouse is such a big investment, in every resource you can think of, that its success is essential. To avoid failure of the warehouse implementation, you must employ a methodology and keep to it. Failure is generally caused in two ways. The first cause of failure is that the warehouse is not delivered on time, and the second is that the warehouse fails to deliver what the business users need. A good method helps to manage expectations by identifying clear deliverables. Modeling The warehouse may be modeled from scratch or using an existing operational model that defines the operational systems. It is more common (and recommended) to model from scratch, referencing the source systems available and identifying any gaps in data needs. The data warehouse is modeled in a different way from an operational system. First, the structure needs to take into account the way data is analyzed, and the schema is created accordingly. Second, the warehouse is based upon subjects (not functions), and it is these subject areas that form the basis of the model. Subject areas are modeled and implemented one at a time. Modeling Tools You can use specific modeling tools, such as Oracle Designer/2000, to model the warehouse initially and facilitate iterative development. ..................................................................................................................................................... Data Warehousing Fundamentals 3-25 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Extraction, Transformation, and Transportation OLTP Databases Staging File Warehouse Database Purchase specialist tools, or develop programs • • Extraction—select data using different methods • Transportation—move data into the warehouse Transformation—validate, clean, integrate, and time stamp data Copyright  Oracle Corporation, 1999. All rights reserved. ® Data Management • Efficient database server and management tools for all aspects of data management • Imperatives – Productive – Flexible – Robust – Scalable – Efficient • Hardware, operating system and network management Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-26 Data Warehousing Fundamentals Components of a Data Warehouse ..................................................................................................................................................... Extraction, Transformation, and Transportation (ETT) These processes are fundamental to the creation of quality information in the data warehouse. You take data from source systems; clean, verify, validate, and convert it into a consistent state; then move it into the warehouse. • Extraction: The process of selecting specific operational attributes from the various operational systems. • Transformation: The process of integrating, verifying, validating, cleaning, and time stamping the selected data into a consistent and uniform format for the target databases. Rejected data is returned to the data owner for correction and reprocessing. • Transportation: The process of moving data from an intermediate storage area into the target warehouse database. ETT Tools Specialized tools make these tasks comparatively easy to setup, maintain, and manage, compared to in-house developed programs. Specialized tools are available from Oracle with the Data Mart Suite. Specialized tools can be an expensive option, which motivates many warehouses to employ customized ETT programs written in COBOL, C++, PL/SQL, or other programming languages or application development tools. Data Management The heart of the warehouse is the database management system (or Server, in the case of Oracle), which must be: • Productive • Flexible • Robust • Scalable • Efficient The server must possess many other properties (they are considered in a later lesson). The warehouse environment must also be capable of managing the hardware, operating system, and overall network infrastructure. Warehousing environments normally employ a relational database management system (RDBMS) or server. Tools Oracle provides tools (such as Oracle Enterprise Manager) that can be used to manage and control access to the warehouse environment. ..................................................................................................................................................... Data Warehousing Fundamentals 3-27 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Data Access and Reporting Simple Queries Forecasting Drill-down Warehouse Database • • Tools that retrieve data for business analysis Imperatives – Ease of use – Intuitive – Metadata – Training • More than one tool may be required Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-28 Data Warehousing Fundamentals Components of a Data Warehouse ..................................................................................................................................................... Data Access and Reporting Every warehouse implementation requires tools for end user access. The tools chosen depend upon the user’s requirements for information. The tools may be simple reporting tools to more complex OLAP tools, to highly advanced data mining tools. Ultimately, they should be easy to use and provide flexibility. There are hundreds of access and query tools available. Tools It is important that the tools are intuitive and easy to use. It is imperative that the warehouse data is presented to the user in a meaningful business specific manner, one that the user can easily interpret. Metadata provides the user with these data descriptions and navigation information. Users have different query requirements, and one query tool may not fit all requirements. Users may need to perform simple to complex business modeling; trend analysis using data spanning time periods; complex drill-down; simple queries on prepared summary information; what-if analysis; detailed trend analysis and forecasting; and data mining. Note: Data warehouse implementors, or WTI partners, may need to provide extensive and intensive training in the use and optimization of selected extraction and reporting tools. If the tools are SQL-based, for example, the user needs to know how many tables or indexes can be used before execution impedes system performance. ..................................................................................................................................................... Data Warehousing Fundamentals 3-29 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Oracle Warehouse Components Any Source Operational data Any Data Any Access Relational tools Relational / Multidimensional Oracle Medi‘ External data Text, image Spatial Web Audio, video OLAP tools Applications/ Web Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-30 Data Warehousing Fundamentals Oracle Warehouse Vision, Products, and Services ..................................................................................................................................................... Oracle Warehouse Vision, Products, and Services Oracle Warehouse Framework Oracle Warehouse is a comprehensive program involving products, partners, and services. Loading Any Source Oracle and a variety of third-party provide solutions to extract and load data from multiple data sources into the warehouse. You can gather data from multiple sites, and multiple applications. Managing Any Data Oracle warehouses using Oracle7, Oracle8, and Oracle8i relational database management systems can store any data, including atomic, summary, and transient data. You can also store metadata definitions about the data. Analyzing Data Using Any Access Oracle Warehouse presents summarized information using client-server and Web-based tools. • Relational analysis tools: Oracle provides tools for ad how query of relational data as well as the development of custom data warehouse applications. Discoverer is an ad hoc query tool that provides decision support and analysis capabilities through a graphical front end. • Online Analytical Processing (OLAP) tools: The Oracle Warehouse supports multidimensional data, which is a summarized “cube” of information that allows sophisticated analysis across a variety of different dimensions, such as product, time, and region. For OLAP analysis of multidimensional data, Oracle Express Analyzer is an object-oriented ad how query tool. To build custom query and reporting applications, Oracle provides Express Objects, an object-oriented OLAP development environment. ..................................................................................................................................................... Data Warehousing Fundamentals 3-31 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Oracle Data Mart Suite Data Modeling Oracle Data Mart Designer OLTP Databases OLTP Engines Warehousing Engines Data Mart Database Oracle8 SQL*PLUS Data Extraction Data Management Data Access & Analysis Oracle Data Mart Builder Oracle Enterprise Manager Discoverer & Oracle Reports Copyright  Oracle Corporation, 1999. All rights reserved. ® Data Mart Implementation with the Oracle Data Mart Suite • Oracle Enterprise Server • Oracle Enterprise Manager • Oracle Data Mart Designer • Oracle Data Mart Builder • Oracle Discoverer • Oracle Web Application Server • Oracle Reports Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-32 Data Warehousing Fundamentals Oracle Warehouse Vision, Products, and Services ..................................................................................................................................................... Oracle Warehouse Products Oracle Data Mart Suite This suite consists of seven products, all of which are used in this course except Oracle Web Application Server and Oracle Reports. Each of the products in the Oracle Data Mart Suite plays a role in the implementation or use of the data mart. ODMS delivers an integrated package with the software and documentation needed to implement a data mart quickly and easily. ODMS consists of these products: • Oracle Enterprise Server • Oracle Enterprise Manager • Oracle Data Mart Designer • Oracle Data Mart Builder • Oracle Discoverer • Oracle Web Application Server • Oracle Reports and Reports Server ..................................................................................................................................................... Data Warehousing Fundamentals 3-33 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Oracle Warehouse Builder Architecture Warehouse Builder Code Generation, Metadata, Workflow Metadata Sources Filter Transform Extraction Facilities • Loader • Remote SQL • Gateways - OLE-DB/ ODBC - Mainframe - Specialized • ERP Data - SAP - Peoplesoft - Oracle PL/SQL, Java Transforms Transform Driver PL/SQL, Java Wrapper Target Tables Oracle8i External Functions Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-34 Data Warehousing Fundamentals Oracle Warehouse Vision, Products, and Services ..................................................................................................................................................... Oracle Warehouse Builder Oracle Warehouse Builder (OWB) is the new Oracle integrated product for the design, building, and management of enterprise data warehouses. Oracle Warehouse Builder rolls all the functionality of multiple stand-alone data warehousing tools into a common, fully integrated Java-based graphical user environment. Visual modeling and design, data extraction, movement and loading, aggregation, metadata management, metadata integration with analysis tools, and warehouse administration—literally everything IT shops need to design, build, and manage data warehouses is available in this breakthrough team-and project-oriented visual tool. OWB consists of the following components: • OWB Repository • OWB User Interface • OWB Warehouse Administrator • OWB Software Development Kit • Oracle Integrator for SAP and for PeopleSoft ..................................................................................................................................................... Data Warehousing Fundamentals 3-35 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Oracle Business Intelligence Tools IS develops user’s Views Business users Analysts Current Tactical Strategic Oracle Reports Oracle Discoverer Oracle Express Copyright  Oracle Corporation, 1999. All rights reserved. ® The Tool for Each Task Question Tool Task Oracle Reports Production reporting What were sales by region last quarter? Oracle Discoverer Ad hoc query and analysis What is driving the increase in North American sales? Oracle Express Advanced analysis Given the rapid increase in Web sales, what will total sales be for the rest of the year? Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-36 Data Warehousing Fundamentals Oracle Warehouse Vision, Products, and Services ..................................................................................................................................................... Oracle Business Intelligence Tools Business intelligence is a set of concepts, methods, and process to improve business decisions using information from multiple sources and applying experience and assumptions to develop accurate understanding of business dynamics. Different end users need different tools and access to different data with targeted capabilities. These tools must be able to meet the demands of particular needs. However, they should also work together, and must be able to evolve with users as their needs change. Oracle offers integrated, best-of-breed tools across the entire business intelligence spectrum. Every enterprise has a spectrum of business intelligence requirements. At a basic level, these business intelligence requirements, or tasks, can be associated with particular kinds of questions. Task Business Question Production reporting—the creation and publication of “snapshot” reports of data to answer the question “what happened?”—the kind of reporting on which businesses run, that is, weekly sales reports. Ad hoc query analysis—certain users will need to create their own ad hoc queries to answer the question “why?” Advanced analysis—which includes more sophisticated analytical tasks, such as timeseries analysis, forecasting, financial modeling, and multiuser “what-if” simulations. “What were sales by region last quarter? How many widgets did I produce this week?” Business Intelligence Tool Oracle Reports “What is driving the increase in North American sales?” Oracle Discoverer “Given the rapid increase in Web sales, what will total sales be for the rest of the year?” Oracle Express Oracle Reports, Oracle Discoverer, and Oracle Express are interoperable today, providing seamless analysis across the entire business intelligence spectrum. Discoverer users are able to dynamically pass the contents of a workbook to Express, building a multidimensional cube “on the fly” and invoking the Express calculation engine for more sophisticated analysis. Conversely, Express users are able to “drill out” to Discoverer to explore the detail-level data in the relational system from data summarized in an Express cube. Oracle Reports publishes views of data from both Discoverer worksheets and Express data cubes. ..................................................................................................................................................... Data Warehousing Fundamentals 3-37 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Oracle Warehouse Services Oracle Education Oracle Consulting Customers Oracle Support Services Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-38 Data Warehousing Fundamentals Oracle Warehouse Vision, Products, and Services ..................................................................................................................................................... Oracle Warehouse Services Oracle Consulting This service provides full life-cycle implementation services for data warehousing solutions. Oracle Consulting has leveraged Oracle’s heavy investment in new technology development through involvement in leading-edge client engagements. It has also built knowledge repositories and problem-solving approaches in data warehousing and incorporated them in its Data Warehouse Method. Major new programs are being planned by Oracle Consulting’s Data Warehousing Practice to help companies think about and manage their customers and their businesses in better ways. Concepts such as one-to-one marketing and balanced scorecard are brought to life with data warehousing technology and by professionals who can provide a transition from management vision to fully operational systems. Oracle Education This service offers a suite of products and services to meet your training needs, including instructor-led training, online interactive learning, interactive courseware, in-depth seminars, customized classes, and enterprisewide performance consulting services. Oracle offers courses in a variety of media such as: • Instructor-led training (ILT) courses run either at an Oracle Education Center or even on your site • Customized training (combining media offerings) • Media based training using Computer Based Training (CBT) courses Oracle Support Services This service offers a range of program options, enabling customers to select the best fit for their organization. Ranging from basic telephone support and Web-based systems to highly customized, on-site support, the programs include OracleFoundation, OracleMetals, OracleExpertise, and OracleLifecycle. There are three global support centers and more than 90 local centers worldwide constitute a global support infrastructure that enables Oracle Support Services to provide aroundthe-clock, around-the-world coverage for core technology and mission-critical applications. ..................................................................................................................................................... Data Warehousing Fundamentals 3-39 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Summary This lesson covered the following topics: • Identifying a common, broadly accepted definition of the data warehouse • Distinguishing the differences between OLTP systems and analytical systems • Defining some of the common data warehouse terminology • Identifying some of the elements and processes in a data warehouse • Identifying and positioning the Oracle Warehouse vision, products, and services Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-40 Data Warehousing Fundamentals Summary ..................................................................................................................................................... Summary This lesson covered the following topics: • Identifying a common, broadly accepted definition of the data warehouse • Distinguishing the differences between OLTP systems and analytical systems • Defining some of the common data warehouse terminology • Identifying some of the elements and processes in a data warehouse • Identifying and positioning the Oracle Warehouse vision, products, and services ..................................................................................................................................................... Data Warehousing Fundamentals 3-41 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... Practice 3-1 Overview This practice covers the following topics: • Answering questions regarding data warehousing concept and terminology • Discussing some of the data warehouse concept and terminology Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 3-42 Data Warehousing Fundamentals Practice 3-1 ..................................................................................................................................................... Practice 3-1 1 Indicate whether the following statements about warehouse data are true or false. a b c d e f Statement Data is organized by time. Data is always stored in a relational database. Data relates to business-specific areas. Data is sometimes integrated. Data is replaced according to a refresh cycle. Data warehouses may contain any type of data. True False 2 _______ is a set of rules or structures providing a framework for the overall design of a system or product. a Technical infrastructure b Data access environment c Architecture 3 The ________ is closely related to the architecture and consists of the technologies, platforms, databases, gateways, and other components necessary to make the architecture functional within the corporation. a Data access environment b Technical infrastructure c Data warehouse 4 A telco company needs to understand their network traffic to better pinpoint frequent trouble spots and predict network expansion and usage. Storing call detail records and summarizing them by switch and trunk groups among other things in another environment will satisfy this need. Which of the following are you going to design? a Operational data store (ODS) b Data warehouse ..................................................................................................................................................... Data Warehousing Fundamentals 3-43 Lesson 3: Defining Data Warehouse Concepts and Terminology ..................................................................................................................................................... 5 An online bookstore has customers in their Sales Order System and in their Marketing System. These customers do not match between systems, because Marketing staff do not always update the Marketing System with current and complete customer data. Also, they want to develop profiles of their customers according to buying patterns and summarize product sales to get the feedback necessary to improve marketing programs and promotions. Which of the following are you going to design? a Operational data store (ODS) b Data warehouse 6 Discussion: Discuss the questions below about data warehousing concepts and terminology and present your points to the class at the of the discussion. a Discuss whether a data warehouse, enterprisewide data warehouse, independent data mart, dependent data mart, or operational data store is most suitable for your company’s needs. b Discuss how the pieces of the classic Inmon’s definition of a data warehouse, “A data warehouse is subject oriented, integrated, time variant, non volatile collection of data in support of management’s decision making process” apply to your environment. c How will your recommendations in question 6a above deliver benefits? ..................................................................................................................................................... 3-44 Data Warehousing Fundamentals 4 ................................. Driving Implementation Through a Methodology Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Overview Defining DW Concepts & Terminology Planning for a Successful Warehouse Meeting a Business Need Choosing a Computing Architecture Planning Warehouse Storage Modeling the Data Warehouse ETT (Building the Warehouse) Analyzing User Query Needs Supporting End User Access Managing the Data Warehouse Project Project Management Management (Methodology, (Methodology, Maintaining Maintaining Metadata) Metadata) Copyright  Oracle Corporation, 1999. All rights reserved. ® Objectives After completing this lesson, you should be able to do the following: • • • • • Explain the different approaches to warehouse development and the benefits of an incremental approach Identify the purpose of the Oracle Method Discuss the purpose and fundamental elements of the Oracle Consulting Data Warehouse Method Identify the Data Warehouse Method as a series of processes and approaches Discuss the objectives of the Oracle Warehouse Technology Initiative Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-2 Data Warehousing Fundamentals Overview ..................................................................................................................................................... Overview The previous lesson covered data warehouse concepts and terminology. This lesson discusses the need of driving a data warehouse implementation project through a methodology. Note that the “Project Management” block is highlighted in the course road map on the facing page. Specifically, this lesson introduces the Oracle Data Warehouse Method, a methodology employed by Oracle Consulting Services for incremental development of a total warehouse solution by using a phased development approach. Partnering initiatives launched by Oracle are described. Objectives After completing this lesson, you should be able to do the following: • Explain the different approaches to warehouse development and the benefits of an incremental approach to development • Identify the purpose of the Oracle Method • Discuss the purpose and fundamental elements of the Oracle Consulting Data Warehouse Method • Identify the Data Warehouse Method as a series of processes and approaches • Discuss the objectives of the Oracle Warehouse Technology Initiative ..................................................................................................................................................... Data Warehousing Fundamentals 4-3 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... “Big Bang” Approach Analyze enterprise requirements Build enterprise data warehouse Report in subsets or store in data marts Copyright  Oracle Corporation, 1999. All rights reserved. ® “Big Bang” Approach: Advantages and Disadvantages • Advantages: – The only real advantage is where the warehouse is being built as part of another major project or program such as reengineering and they are dependent on each other – Having a “big picture” of the data warehouse before starting the data warehousing project • Disadvantages: – Involves a high risk, takes a longer time – Runs the risk of needing to change requirements Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-4 Data Warehousing Fundamentals Warehouse Development Approaches ..................................................................................................................................................... Warehouse Development Approaches The most challenging aspect of data warehousing lies not its technical difficulty, but in choosing the best approach to data warehousing for your company’s structure and culture, and dealing with the organizational and political issues that will inevitably arise during implementation. “Big Bang” Approach Historically IT departments attempted to provide enterprisewide data warehouse implementations in a single project approach. Data warehouse development is a huge task, and it is a mistake to assume that the solution can be built all at once. The time required to develop the warehouse often means that user requirements and technologies change before the project is completed. In this approach, you do the following: 1 Analyze the entire information requirement for the organization 2 Build the enterprise data warehouse to support these requirements 3 Build access, as required, either directly or by subsetting to data marts Advantages of the “Big Bang” Approach There are no real advantages in this approach over other approaches, and it should be avoided in most cases. • The only real advantage is where the warehouse is being built as part of another major project or program such as reengineering and they are dependent on each other • Having a “big picture” of the data warehouse before starting the data warehousing project Disadvantages of the “Big Bang” Approach The following are the disadvantages to this approach. • Involves a high risk • Takes a longer time to deliver any perceived business benefit • Runs the risk of needing to change requirements, which will change during analysis ..................................................................................................................................................... Data Warehousing Fundamentals 4-5 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Incremental Approach to Warehouse Development • • • Strategy Definition Analysis Multiple iterations Shorter implementations Validation of each phase Strategy Design Definition Build Production Analysis Design Build Strategy Definition Analysis Production Design Build Production Copyright  Oracle Corporation, 1999. All rights reserved. ® Benefits of an Incremental Approach • Delivers a strategic data warehouse solution through incremental development efforts • • Provides extensible, scalable architecture • Quickly provides business benefits and ensures a much earlier return of investment • Allows a data warehouse to be built based on a subject or application area at a time • Allows the construction of an integrated data mart environment Supports the information needs of the enterprise organization Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-6 Data Warehousing Fundamentals Warehouse Development Approaches ..................................................................................................................................................... Incremental Approach The incremental approach manages the growth of the data warehouse by developing incremental solutions that comply with the full-scale data warehouse architecture. Rather than starting by building an entire enterprisewide data warehouse as a first deliverable, start with just one or two subject areas, implement them as scalable data mart and roll them out to your end users. Then, after observing how users are actually using the warehouse, add the next subject area or the next increment of functionality to the system. This is also an iterative process. It is this iteration that keeps the data warehouse in line with the needs of the organization. Think big and start small. In other words, your strategy identifies the enterprisewide warehouse which is delivered by small increments, in short timeframes. Benefits Some of the benefits of the incremental approach to warehouse development are listed below. • Delivers a strategic data warehouse solution through incremental development efforts • Provides extensible, scalable architecture • Supports the information needs of the enterprise organization • Quickly provides business benefit and ensures a much earlier return of investment • Allows a data warehouse to be built based on a subject or application area at a time • Allows the construction of an integrated data mart environment ..................................................................................................................................................... Data Warehousing Fundamentals 4-7 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Top-Down Approach Data warehouse Data marts Users Legacy data Sales Operations data Marketing External data sources Copyright  Oracle Corporation, 1999. All rights reserved. ® Top-Down Approach: Advantages and Disadvantages • Advantages: – Provides a relatively quick implementation and payback – Offers significantly lower risk – Emphasizes high-level business needs – Achieves synergy among subject areas • Disadvantages: – Requires an increase in up-front costs – Difficult to define the boundaries – May not be suitable unless the client needs cross-functional reporting Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-8 Data Warehousing Fundamentals Warehouse Development Approaches ..................................................................................................................................................... Top-Down Incremental Approach This is the fundamental approach recommended for data warehousing projects in the Oracle Data Warehouse Method. In this approach, you do the following: 1 Analyze enterprise requirements to develop a conceptual information model and warehouse road map including identifying and prioritizing subject areas. 2 Complete a model of a selected subject area, map to available data, and perform a source system analysis. 3 Implement base technical architecture and establish metadata, extraction, and load processes for the initial subject area. 4 Create and populate the initial subject area data mart within the overall warehouse framework. Advantages of the Incremental Top-Down Approach This approach has the following advantages: • Provides a relatively quick implementation and payback. Typically, the scoping, definition study, and initial implementation are scaled down so that they can be completed in six to seven months. • Offers significantly lower risk because it avoids being as analysis heavy as the “big bang” approach. • Emphasizes high-level business needs. • Achieves synergy among subject areas. Maximum information leverage is achieved as cross-functional reporting and a single version of the truth are made possible. Disadvantages of the Incremental Top-Down Approach This approach has the following disadvantages: • Requires an increase in up-front costs before the business sees any return on their investment • Is difficult to define the boundaries of the scoping exercise if the business is global • May not be suitable unless the client needs cross-functional reporting Note: An enterprise data warehouse is not always the right answer, but if you are going to build an enterprise data warehouse, then this approach is by comparison the best approach. ..................................................................................................................................................... Data Warehousing Fundamentals 4-9 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Bottom-Up Approach Data marts Data warehouse Legacy data Sales Operations data Marketing External data sources Copyright  Oracle Corporation, 1999. All rights reserved. ® Bottom-Up Approach: Advantages and Disadvantages • Advantages: – Appealing to IT – Easier to get buy-in from IT • Disadvantages: – Requires source systems to encapsulate the current business processes – Design may be out-of-date before delivery – Requires reengineering for each increment – Solutions may be rejected by the next line of business to be involved – Overall benefit to the business may be minimized Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-10 Data Warehousing Fundamentals Warehouse Development Approaches ..................................................................................................................................................... Bottom-Up Incremental Approach This approach is similar to the top-down approach but the emphasis is on the data rather than the business benefit. Here, IT is in charge of the project either because IT wants to be in charge or the business has deferred the project to IT. The general steps in this approach are as follows: 1 Generally define the scope and coverage of the data warehouse. 2 Analyze the source systems that are in scope for the data warehouse. 3 Define the initial increment based on the political pressure, assumed business benefit and data volumes. 4 Define the target model based on the source and map source to target. 5 Implement base line technical architecture and establish metadata, extraction, and load processes as required to support the increment. 6 Create and populate the initial subject areas within the overall data warehouse framework. Advantages of the Bottom-Up Incremental Approach This approach has the following advantages: • This is a “proof of concept” type of approach and therefore it is often appealing to IT. • It is easier to get IT buy-in for this approach because it is focused on IT. Disadvantages of the Bottom-Up Incremental Approach This approach has the following disadvantages: • Because of the solution model is typically developed from source systems and these source systems will have encapsulated within them the current business processes, the overall extensibility of the model will be compromised. • IT are often the last to know about business changes—IT could be designing something that will be out of date before they complete its delivery. • As the framework of definition in this approach tends to be much narrower, often a significant amount of reengineering work is required for each increment. • As data definitions are rarely agreed upon by various lines of business for the first increment, the solution may be rejected by the next line of business to be involved. • IT staff are used to data and not information. It is unusual for them to consider the temporal aspects of the data, thus minimizing the overall benefit to the business. ..................................................................................................................................................... Data Warehousing Fundamentals 4-11 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Oracle Method • Consists of: – Online guidelines and manuals – Workplan templates – Deliverable templates • Created by experienced and field-based practitioner for estimating, managing, developing, and delivering business solutions. Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-12 Data Warehousing Fundamentals The Need for an Iterative and Incremental Methodology ..................................................................................................................................................... The Need for an Iterative and Incremental Methodology The recommended approach to a data warehousing project is using an iterative and incremental approach. By restricting efforts to those required to bring up and maintain a single subject warehouse, it is much easier to demonstrate value in a relatively short period of time and obtain management buy-in regarding the potential value of the approach. On the other hand, such approach addresses managed growth of the data warehouse through development of incremental solutions that comply with a full-scale and enterprisewide data warehouse architecture. The scoped increments are delivered in relatively short timeframes while complying with the strategic data warehouse architecture. Data Warehouse Method (DWM) is Oracle’s full life-cycle approach to delivering data warehouse solutions. The DWM is part of Oracle Method that is Oracle’s integrated approach to solution delivery. Oracle Method The Oracle Method (OM) methodology provides the means to document, standardize, reuse, and improve the way that we deliver services. It consists of online guidelines and manuals, workplan templates, and deliverable templates created by experienced and field-based practitioner for estimating, managing, developing, and delivering business solutions. ..................................................................................................................................................... Data Warehousing Fundamentals 4-13 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Oracle Data Warehouse Method • Guides through development: – Business functions – Processes – Tasks • Modeled on the Custom Development Method Copyright  Oracle Corporation, 1999. All rights reserved. ® Method Materials Software Tools • • • • Workplan templates* Deliverable templates* Handbooks • • Method handbook • Deliverable reference* Online handbooks Estimating software Process and task reference* *Not production available yet Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-14 Data Warehousing Fundamentals Oracle Data Warehouse Method ..................................................................................................................................................... Oracle Data Warehouse Method The Oracle Data Warehouse Method (DWM) is based on the proven Oracle Method, which documents, standardizes, and improves the way services are delivered. Services include initial strategic studies, business process reengineering, custom and package application implementation, change management, and program management. By following a standard approach to defining tasks and deliverables, and are easily integrated to suit your needs. Method Materials The Oracle Method includes software and hard copy handbooks for all lines of business. These components of the Oracle Method assist all members of your project team, from project managers to analysts to developers. The software includes: • Workplan templates* • Deliverable templates* • Online handbooks • Estimating software The hard copy handbooks contain: • Method handbook • Process and task reference* • Deliverable reference* * Not production available yet and will be available in later releases. ..................................................................................................................................................... Data Warehousing Fundamentals 4-15 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Oracle Data Warehouse Method • • • • • • Focuses on scoping • Employs common techniques, skills, and dependencies • Assigns tasks to processes and processes to phases Manages risk Relies on user involvement throughout Delivers an extensible, scalable solution Uses a variety of technologies Identifies tasks with clear objectives and deliverables ® Copyright  Oracle Corporation, 1999. All rights reserved. Benefits Consistency Flexibility Experience and best practices Productivity Risk avoidance Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-16 Data Warehousing Fundamentals Oracle Data Warehouse Method ..................................................................................................................................................... Oracle Data Warehouse Method A warehouse project has many challenges, and the method addresses them by: • Focusing on scoping and requirements, and creating a data warehouse architecture that is flexible and able to flourish in a dynamic business environment with unpredictable uses • Managing the risk of a data warehouse project by developing a strong business case, including measurements to validate the success of the warehouse. • Involving users throughout the life of the project, and advocating the involvement of a strong executive sponsor from your organization • Defining the technical and warehouse architecture, integrating all data warehouse components, and delivering an extensible and scalable solution • Outlining approaches, such as data mart solutions, that produce quick and immediate business benefit while adhering to a strategic architecture • Employing a variety of technologies available from Oracle and third-party vendors, such as a relational database, OLAP, data acquisition, data access, metadata, and warehouse management technologies • Laying out the processes and tasks relevant to a data warehouse project, with clear objectives and deliverables • Assigning tasks to processes, based on common techniques, skills, or dependencies • Assigning processes to phases, based upon the development approach selected (The end of a phase reflects the completion of a major set of objectives and milestones in a data warehouse development effort.) Benefits The experience and best practices provide the following benefits: • Consistency is achieved among consultants and practitioners because all organizations are working from a common set of tasks and deliverables with a clear understanding of the development processes. • Productivity is increased by following established approaches and adhering to successful practices. Productivity is also improved by the reduction in mistakes and reworking, and the ability for a consultant to understand the structure and flow of the project very quickly. • Flexibility is gained by providing a structured development environment that allows personnel to be used efficiently based on skills and availability. Flexibility is also achieved by using a common set of tasks as a foundation for the project with the ability to customize the tasks based on the needs of each client. • Low risk is achieved through the use of a common set of tasks that outlines the best ways of developing a warehouse. Mistakes are avoided and the impacts of decisions can be evaluated within the framework and guidelines of experience. ..................................................................................................................................................... Data Warehousing Fundamentals 4-17 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... DWM Fundamental Elements • • • • • Approaches Phases Processes Tasks and deliverables Roles Phase 1 Phase 2 Phase 3 Process 1 Process 2 Phase 1 Task1 Phase 1 Task2 Phase 1 Task3 Phase 2 Task1 Phase 2 Task2 Phase 2 Task3 Copyright  Oracle Corporation, 1999. All rights reserved. Phase 3 Task1 Phase 3 Task2 Phase 3 Task3 ® ..................................................................................................................................................... 4-18 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... DWM Fundamental Elements The fundamental elements of DWM are: • Approaches: Because Data Warehouse Method is an umbrella method that must apply to any type of warehouse engagement, from the smallest OLAP engagement to the largest multiterabyte one that also includes data access, a series of approaches have been defined. These approaches make the method more accessible by tailoring it to specific types of service offerings. • Phases: A phase is grouping of processes with a common objective. • Processes: This is a grouping of tasks with a common objective. They also typically have a common skill set. • Tasks and deliverables: A task is defined as a unit of work that results in the output of a single deliverable. As the most elementary unit of work, tasks provide the core of the work breakdown structure (WBS). A WBS simply groups tasks into a hierarchy for planning and scheduling purposes. • Roles: A skill set of resources assigned to a project ..................................................................................................................................................... Data Warehousing Fundamentals 4-19 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Approaches Incremental Packaged data mart Increment I Proof of Concept Warehouse infrastructure implementation Business application implementation Data mart Warehouse Data mart Data mart Increment II through N Increment II through N Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-20 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Data Warehouse Method Approaches Methods are developed and documented by phase. Phasing is a useful and necessary concept for managing projects but can cause unnecessary overhead and project inefficiencies if only one phasing model is available for all sizes and types of projects. Based on the type of data warehouse solution required, you determine the development approach that is right for the project. Currently DWM incorporates different project phasing models. Incremental The incremental approach is proven and is considered the best development practice for data warehousing. This is due to the delivery of immediate and consistent benefits to the organization, while balancing the delivery of incremental solutions with a strong, long-term data warehouse architecture. The goal of the incremental approach is to provide benefits quickly during the initial increment. Each incremental development effort for the data warehouse solution must be defined and scoped. This allows complexity and risk to be managed and reuse of work done in prior increments to be reused and leveraged. Each increment should support a well-defined, long-term data warehouse architecture designed for corporatewide data and all functional areas of the client organization. The incremental approach enables you to develop increments in order of business need or highest return on investment (ROI). Packaged Implementation The Package Implementation approach is the viable alternative for quickly delivering a useful warehouse solution that is focused on a specific business function, that is, creating data mart solutions. Because a data warehouse begins to deliver value as soon as the first query is run, implementing a package solution can maximize the client’s potential to identify and leverage opportunities quickly, and hence to gain a competitive advantage. ..................................................................................................................................................... Data Warehousing Fundamentals 4-21 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Incremental Approach Business Strategy Warehouse Strategy Phase IT Strategy Scoping Services Requirements Capture Technical Architecture Services Warehouse Infrastructure Services Warehouse Business Solution Services Increment 1 Increment A Proof of Concept Increment 2 Increment 3 Increment B Increment C Increment z Increment n ® Copyright  Oracle Corporation, 1999. All rights reserved. Incremental Development • Focus on business functionality • Deliver business benefit • Suited to warehouse evolution • Once an increment is complete the selection and scope of the next increment is defined • Strategy Incremental Development Definition Analysis PGM/PJM Project and Program Management Design Build ETA Enterprise Technical Architecture Transition to Production Discovery Each increment follows the same phase sequence Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-22 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Incremental Approach The incremental approach is the preferred Oracle approach to building an enterprise data warehouse solution; it is effective and proven. This approach manages the growth of the data warehouse by developing incremental solutions that comply with the fullscale data warehouse architecture. The architecture is designed to provide a solid framework for the long-term data warehouse. It includes a central data warehouse with corporate data for all functional areas, and the functionality to populate, manage, and access the full-scale data warehouse. The data warehouse also controls and feeds each data mart within the architecture. By establishing this architecture, the strategic data warehouse can grow incrementally while supporting data extensibility and avoiding a divergent group of data marts. Incremental Development The increments start with the strategy phase, which defines the overall data warehouse solution and architecture at a high level, including: • Scope of entire solution • Identification and prioritizing of increments • Initial technical architecture • Initial data warehouse architecture An initial increment is then developed following the phasing model. The increment is usually scoped to provide maximum benefit, target a specific user audience, and ensure that the concept can be proved. At the end of each increment, the discovery phase acts as the review and evaluation phase. Subsequent increments follow the same phasing approach, building on experiences gained and lessons learned from development of the first increment. Data Mart Development DWM also provides an approach for the development of a solution scoped to address the requirements of a specific functional area or organization—a data mart solution. ..................................................................................................................................................... Data Warehousing Fundamentals 4-23 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... The Strategy Phase Strategy Definition Analysis Design Strategy Business requirements Data acquisition Architecture Build Data quality Transition Administration Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® The Strategy Phase Strategy Definition Analysis Design Strategy Metadata Data access Documentation Build Testing Transition Training Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-24 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Phases of the Incremental Approach Strategy Phase The goal of the strategy phase is to clearly define the business objectives and purpose of the data warehouse solution. Business objectives for the data warehouse project must be driven by top management and must be business-centric. The purpose and objectives for the total data warehouse solution are essential to setting and managing expectations. The strategy phase also clearly defines the data warehouse team and the executive sponsor. The overall objectives of the strategy phase include: • Achieve a clear awareness of the business goals and objectives. • Derive the data warehouse scope from business objectives. • Document a clear definition of the data warehouse scope in its entirety. • Document the incremental approach used to support the business objectives. • Define success measurements. • Identify the operational and external data sources required to support the business goals. • Outline the strategies for data acquisition and data quality. • Define the strategy for warehouse administration. • Identify the role of metadata and document the strategy for metadata management. • Define the data access methods necessary to support business objectives. • Describe the strategy for warehouse documentation and training. • Identify the testing methods necessary to support user acceptance. • Identify the existing technical architecture and capacity plan. • Create the enterprise data warehouse architecture. • Determine the configuration and capacity requirements. Prerequisite information needed for the strategy phase includes: • High-level business descriptions and existing reference material • Source system documentation and data models, including external data providers Note: Without a complete understanding of the business objectives and scope of the overall warehouse you will not be able to proceed successfully. ..................................................................................................................................................... Data Warehousing Fundamentals 4-25 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... The Definition Phase • Strategy Definition Analysis Design Definition Business requirements Data acquisition Architecture Build Data quality Transition Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® The Definition Phase • Strategy Definition Definition Administration Analysis Metadata management Design Data access Build Documentation Transition Training Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-26 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Phases of the Incremetnal Approach (continued) Definition Phase The goal of the definition phase is to clearly define the scope and objectives for the incremental development effort. Initial increment, conceptual models are created, data sources are documented, and the scope of data quality is clearly defined. The technical architecture and data warehouse architecture are also created. The overall objectives of the definition phase include: • Document a clear scope of the definition phase. • Understand operational and external data sources. • Plan for the initial load and refresh of the warehouse. • Define the interface, configuration, and capacity requirements. • Integrate metadata. • Define the scope of the data quality effort. • Outline warehouse administration efforts. • Outline data access methods. • Train the user community. Prerequisite information needed for the definition phase includes: • Business goals and objectives • Data warehouse purpose, objectives, and scope • Enterprise data warehouse logical model • Source system data flows • Subject area gap analysis • Data acquisition strategy • Data warehouse architecture and technical infrastructure • Data access environment and data quality strategy • Data warehouse administration strategy, metadata strategy, and training strategy ..................................................................................................................................................... Data Warehousing Fundamentals 4-27 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... The Analysis Phase • Strategy Analysis Definition Analysis Design Business requirements Data acquisition Architecture Build Data quality Transition Administration Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® The Analysis Phase Strategy Analysis Definition Analysis Design Metadata Data access Documentation Build Testing Transition Training Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-28 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Phases of the Incremental Approach (continued) Analysis Phase The goal of the analysis phase is to focus on the users’ information, data acquisition, and data access requirements for business analysis and decision making. Relational and multidimensional models are produced for the data warehouse, metadata, and if appropriate, the data marts. Tool selection is also completed for all appropriate warehouse components during this phase. The overall objectives of the analysis phase include: • Collect and model detailed data requirements, including summarization, to support the business requirements. • Identify and model multidimensional structures. • Map source data to target database objects. • Resolve design conflicts and data quality issues. • Collect and model metadata requirements. • Collect detailed data access, reports, and query requirements. • Select the appropriate tools for data acquisition, data quality, administration, metadata, and data access components of the warehouse project. Prerequisite information needed for the strategy phase includes: • Business goals and objectives • Data warehouse purpose, objectives, and scope • Detailed data load, refresh, and summarization plan • Detailed data quality acceptance plan • Data warehouse architecture, technical infrastructure, and capacity plan • Warehouse administration and metadata integration plans • Data access and training plans • Viable data acquisition tools, data quality tools, metadata tools, and data access tools lists ..................................................................................................................................................... Data Warehousing Fundamentals 4-29 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... The Design Phase Strategy Design Definition Data acquisition Analysis Architecture Design Data quality Build Administration Transition Metadata management Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® The Design Phase Strategy Definition Analysis Design Build Design Data access Database design & build Documentation Testing Training Transition Transition Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-30 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Phases of the Incremental Approach (continued) Design Phase The goal of the design phase is to transform the requirements identified during the analysis phase into detailed design specifications and to complete the technical architecture installation. The overall objectives of the design phase include: • Document a clear scope of the design phase. • Design the initial data load and refresh modules. • Execute the hardware and software installation plan. • Design the data cleansing, error and exception handling, and audit and control modules. • Outline the metadata specifications for reporting, bridging, and capturing. • Design the end user layer and standard queries and reports. • Establish and document the user and role access privileges. • Create the database designs for the data warehouse, data mart, metadata repository, and multidimensional structures identified during the analysis phase. • Document the initial version of all modules designed. • Create the test plans for integration testing, system testing, regression testing, volume testing, and ad hoc query testing. Prerequisite information needed for the design phase includes: • The initial data load and refresh requirements • The technical infrastructure and data warehouse architecture • The data acquisition plan • The metadata requirements • The data access requirements • The test strategy ..................................................................................................................................................... Data Warehousing Fundamentals 4-31 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... The Build Phase Strategy Build Definition Data acquisition Analysis Architecture Design Data quality Build Administration Transition Metadata management Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® The Build Phase Strategy Build Definition Analysis Design Build Data access Database design &build Documentation Testing Training Transition Transition Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-32 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Phases of the Incremental Approach (continued) Build Phase The goal of the build phase is to create and test the database structures, data acquisition modules, warehouse administration modules, metadata modules, data access modules, and reports and queries. The overall objectives of the build phase include: • Deliver a well-designed, thoroughly-tested, and integrated data warehouse solution. • Optimize the database structures to meet design standards and performance objectives. • Deliver access components. • Deliver documentation for using and maintaining the warehouse. Prerequisite information needed for the design phase includes: • The data acquisition module designs • The technical architecture and capacity plan • The data quality and issue resolution plans • The warehouse administration and scheduling plan • The metadata implementation plan • Specifications for the end-user layer, standard queries and reports, roles and privileges, and query governor limits • The logical and physical database and multidimensional database design • The index and data storage design • The user guide, the metadata reference guide, and the warehouse administration reference • Test plans for integration testing, system testing, environment testing, regression testing, and ad hoc access testing ..................................................................................................................................................... Data Warehousing Fundamentals 4-33 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Transition to Production Phase Strategy Transition to production Definition Data acquisition Analysis Testing Design Training Build Transition Transition Post-implementation support Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® Discovery Phase Strategy Definition Analysis Design Build Transition Discovery Post-implementation support Discovery Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-34 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Phases of the Incremental Approach (continued) Transition to Production Phase The goal of the transition to production phase is to install the warehouse, go to production, prepare the users to use and manage the solution, and begin managing the growth and maintenance of the warehouse. The overall objectives of this phase include: • Install the warehouse solution. • Prepare users to use the warehouse and support personnel to manage the warehouse. • Populate the production database with production data on the production platform, using production modules. • Deliver an integrated warehouse and monitor the performance and end-user access. • Identify additional access and informational requirements. Prerequisite information needed for the transition to production phase includes: • All production implementation modules • The integrated data warehouse architecture and technical infrastructure • Production data • Installation plan • System documentation • Training materials Discovery Phase The goal of this phase is to evaluate the implemented increment, identify increment opportunities, and identify and plan for the next increment. This enables for the users and developers to analyze the effort most recently undertaken, make adjustments, review the possible increments, and select the next effort based on business need and data warehouse infrastructure need. The overall objectives of this phase include: • Perform a detailed evaluation of the implemented increment. • Identify opportunities and select the next increment. • Evaluate the completed project plan and consider experiences and lessons learned from previous efforts. • Drive ongoing data warehouse development with business need and user input. Prerequisite information needed for the discovery phase includes: • System in production • Increment project plan • Use log evaluation • Enterprise data warehouse implementation road map and infrastructure road map • Enterprise data warehouse architecture and technical architecture • Increment technical architecture • Enterprise data warehouse requirements ..................................................................................................................................................... Data Warehousing Fundamentals 4-35 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Processes • • • Cohesive set of tasks that meet objectives Common skill set Project deliverables Most overlap and interrelate; others are strict predecessors Copyright  Oracle Corporation, 1999. All rights reserved. ® Processes Business Requirements Definition Data Acquisition Architecture Data Quality Warehouse Administration Metadata Management Data Access Database Design and Build Documentation Testing Training Transition Post-Implementation Support Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-36 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Processes A process is a cohesive set of related tasks that meets a specific project objective and results in key deliverables. Each process is a discipline involving similar skills to perform the tasks within the process. You might think of a process as a simultaneous subproject within a larger development project. Every data warehouse project involves most if not all of the following processes, whether they are the responsibility of the consulting team, the client, IT staff, a third party, or a combination of these. Most processes overlap in time with others and are interrelated through common deliverables, while others are strict predecessors of each other. • Business Requirements Definition • Data Acquisition • Architecture • Data Quality • Warehouse Administration • Metadata Management • Data Access • Database Design and Build • Documentation • Testing • Training • Transition • Post-Implementation Support ..................................................................................................................................................... Data Warehousing Fundamentals 4-37 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Business Requirements Definition • • • • Defines requirements • • Identifies information needs Clarifies scope Establishes implementation road map Provides initial focus on enterprise implementation Models the requirements Copyright  Oracle Corporation, 1999. All rights reserved. ® Data Acquisition • Identify, extract, transform, and transport source data • • • Consider internal and external data • • • Define first-time load and refresh strategy Move data between sources and target Perform gap analysis between source data and target database objects Define tool requirements Build, test, and execute data acquisition modules Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-38 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Business Requirements Definition The Business Requirements Definition process defines the requirements, clarifies the scope, and establishes the implementation road map of the data warehouse. With the direction of the business organization, strategic business goals and initiatives are outlined and used to direct the strategies, purpose, and goals of the data warehouse solution. As the process continues, Business Requirements Definition focuses on scoping the solution to be developed and delivered, identifying the warehouse information needs, and modeling the requirements. Data Acquisition The Data Acquisition process identifies, extracts, transforms, and transports all source data necessary for the operation of the data warehouse. Data acquisition is performed among several components of the warehouse, including operational and external data sources to data warehouse, data warehouse to data mart, and data mart to individual marts. Early in the data acquisition process, data sources are identified and evaluated against the subject areas, and gap analysis is conducted to ensure that the data is available to support the information requirements. Strategies are developed for the first-time load of the warehouse and for the subsequent refreshes of the warehouse. You evaluate tools against high-level requirements and make recommendations. With the detailed analysis output, modules are designed and built to extract, transform, transport, and load the source data into the warehouse. Once built, the modules are tested and executed and the production database objects are populated. ..................................................................................................................................................... Data Warehousing Fundamentals 4-39 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Architecture • • • Specify technical foundation • Determine database environment—distributed or centralized • Define development, testing, training, and production environments • • • Configure the platform Create warehouse architectural design Integrate products of architecture components for scalability and flexibility Perform database sizing Consider disk striping Copyright  Oracle Corporation, 1999. All rights reserved. ® Data Quality • • • • • Ensure data consistency, reliability, accuracy Develop a strategy for: – Cleansing – Integrity functions – Quality management procedures Identify business rules for: – Cleansing – Error handling – Audit and control Define data quality tool requirements Build, test, and execute data quality modules Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-40 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Architecture The Architecture process specifies elements of the technical foundation and architectural design of the data warehouse. The focus is on integrating different products and the data warehouse components to ensure an extensible and scalable architecture. For the technical architecture, an evaluation is performed to determine whether the database environment should be distributed or centralized. Network, hardware and software requirements, including acquisition; infrastructure changes; and the platform configuration are defined and implemented. The platform configuration covers the data acquisition environment, server architecture, middleware, database sizing, and disk striping. The data warehouse architecture ensures an integrated strategic data warehouse architecture while delivering incremental solutions. Data Quality The Data Quality process ensures the consistency, reliability, and accuracy of the data in the warehouse. A data quality strategy is developed based upon a clear understanding of the agreements and contractual obligations for data cleansing, audit and control, and integrity functions. Data management procedures are defined. Data quality tools are evaluated and recommended. The process identifies the business rules for error exception and handling, scrubbing and cleansing, and audit and control. The business rules for error handling may vary between the initial load and subsequent updates to the data warehouse. Using the data quality strategy, procedures, and tools, modules are developed to support the requirements for data quality. ..................................................................................................................................................... Data Warehousing Fundamentals 4-41 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Warehouse Administration • Specify maintenance strategy for: – Configuration management – Warehouse management – Data governing • Define warehouse management workflow and tool requirements • • Build, test, and execute modules • Automate warehouse management tasks Prove data access management and monitoring Copyright  Oracle Corporation, 1999. All rights reserved. ® Metadata Management • • • Define metadata strategy • Establish technical and business views of metadata • Develop modules for capturing, bridging, and accessing metadata Define metadata types Specify requirements for the metadata repository, integration, and access Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-42 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Warehouse Administration The Warehouse Administration process specifies the strategy and requirements for the maintenance, use and ongoing update of the data warehouse. Strategies are established for configuration management, warehouse administration, and data governing. Warehouse administration workflow, tool evaluation, and testing are addressed. Modules are designed and built for scheduling, backup and recovery, archiving, security, audit, and data governing. Several data access management and monitoring tasks are addressed during this process, including authorizing access to appropriate levels of data, monitoring usage, governing queries, identifying repetitive queries, calculating metrics, defining access thresholds, adding or removing users, and updating access authority. To provide successful ongoing support and maintenance of the warehouse, this process focuses on the automation of the warehouse management tasks. The process also defines strategies for security and control, backup and recovery, disaster recovery, archiving, and restoration. Metadata Management The Metadata Management process specifies the metadata strategy and the requirements for the metadata repository, integration, and access. The primary objective of this process is to provide technical and business views of the warehouse metadata. • The technical view focuses on compiling the metadata to support warehouse management. This view includes data acquisition rules; transformation of source data to the target database; time and date of data; data authorization; refresh, archive, and backup schedules and results; and the data accessed, including metrics such as frequency and volume of requests. • The business view focuses on enabling users to understand the information available in the warehouse and how it may be accessed. The business metadata focuses on what data is in the warehouse, the source of the data, how it was transformed from source to target, and information compiled while accessing the warehouse. The Metadata Management process also develops the modules for capturing, bridging, and accessing the metadata. Metadata is created by several data warehouse components, such as data acquisition, database design, and data access. Each component, particularly if supported by a tool, has its own metadata storage facility and access capabilities, therefore the disparate metadata must be linked using bridging capabilities to ensure consistency and to facilitate access by the appropriate personnel. ..................................................................................................................................................... Data Warehousing Fundamentals 4-43 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Data Access • • • Identify, select, and design user access tools • • Evaluate, acquire, and install access tools Define user profiles Determine requirements for interface style, queries, reports, and the end user layer Design and develop data access objects – Queries and reports – Catalogs – Hierarchies and dimensions Copyright  Oracle Corporation, 1999. All rights reserved. ® Database Design and Build • • • • Support data requirements • Evaluate partitioning, segmentation, and placement • • • Identify indexes and keys Provide efficient access Create and validate logical and physical models Create relational and multidimensional database objects Generate DDL Build and implement database objects Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-44 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Data Access The Data Access process focuses on identifying, selecting, and designing tools to support user access to data. A strategy is established and the user requirements are defined as a framework for the data access environment. Tools are evaluated, tested, and recommended. User profiles are defined based on the level of data required to support their analysis, decision-making requirements, and skill level. Detailed requirements are also collected for the user interface style and for queries and reports. With the user profiles, functional requirements, and levels of data to be accessed, tool criteria are established for each data access component. In most cases, data access is supported by a variety of tools rather than one tool to support everyone. After tools are selected and installed, the data access objects are designed and developed, including canned queries and reports, catalogs, metadata retrieval, hierarchies, dimensions, user layer schemas, and user interfaces. Database Design and Build The Database Design and Build process implements the design of database objects that support the data requirements and ensure efficient access to the data. This process focuses on creating and validating the database logical and physical designs for the relational and multidimensional database. Physical data partitioning, segmentation, and data placement are evaluated against business and user requirements and operational constraints. Indexes and key definitions are decided. The database data definition language (DDL) is generated and is used to build and implement the development, testing, and production of data warehouse database objects. ..................................................................................................................................................... Data Warehousing Fundamentals 4-45 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Documentation Produce textual deliverables: • • • • • • Glossary User and technical documentation Online help Metadata reference guide Warehouse management reference New features guide Copyright  Oracle Corporation, 1999. All rights reserved. ® Testing • • • • Develop a test strategy Create test plans, scripts, and scenarios Test all components: – Data acquisition – Data access – Ad hoc access – Regression – Volume – Backup – Recovery Support acceptance testing Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-46 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Documentation The Documentation process focuses on producing all user and technical documentation for the data warehouse, including references, user and system operations guides, and online help. To ensure active and successful use of the warehouse, the metadata reference guide describes the contents of the data warehouse in business terms and provides a navigational road map to the contents of the data warehouse. In addition, the warehouse management documentation outlines the workflow and manual and automated management procedures. The new features guide highlights any enhancements to warehouse functionality that result from the implementation of the solution. Testing The Testing process is an integrated approach to testing the quality of all components of the data warehouse. The testing strategy is developed and approved before the test system is created. System integration and module test plans, test scripts, and test scenarios are developed. Each test is performed and proven. Testing includes proving the physical design of the database. Data acquisition modules, data access tools, and canned queries and reports also undergo thorough module and integration testing. The testing strategy addresses all components of the solution, including the ad hoc access processes. Regression testing is performed, testing changes to the data warehouse against a baseline, to ensure past functionality works when an enhancement is added. Volume testing is conducted on the production platform to ensure that performance meets established objectives. Preparation of the acceptance environment and support for acceptance testing are also performed during the Testing process. ..................................................................................................................................................... Data Warehousing Fundamentals 4-47 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Training • Define requirements: – Technical – End user – Business • • • • Identify staff to be trained Establish time frames Design and develop materials Focus on tool training and use of the warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® Transition • Define tasks for transitioning to the production warehouse • • • • Migrate modules and procedures Develop the installation plan Prepare the maintenance environment Prepare the production environment Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-48 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Training The Training process defines the development and user training requirements, identifies the technical and business personnel requiring training, and establishes time frames for executing the training plans. Training plans and training materials are designed and developed. User and technical training is conducted. The key objective is to provide both users and administrators with adequate training to take on the tasks of operating, maintaining and using the data warehouse solution. Training should focus on tool training and how business value is generated from the information in the data warehouse. Transition The Transition process focuses on tasks to perform to transition to the production data warehouse, and includes tasks to create the installation plan and prepare the maintenance and production environments. During this process, the warehouse management workflow is implemented and the production data warehouse is available. ..................................................................................................................................................... Data Warehousing Fundamentals 4-49 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Post-Implementation Support • • • • • • • Evaluate and review warehouse use Monitor warehouse use Refresh the warehouse Monitor and respond to problems Conduct performance testing and tuning Transfer responsibility Evaluate and review the implemented solution Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-50 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Post-Implementation Support The Post-Implementation Support process provides an opportunity to evaluate and review the solution. You evaluate use of the warehouse by accessing metadata and evaluating queries and reports run against the warehouse. The information assists with management of standard queries and reports, and the user layer, and identifies required indexes. The process also focuses on refreshing the warehouse, monitoring and responding to system problems, correcting errors, and conducting performance and tuning activities for all components of the data warehouse. Other actions at this time include: • Change control for information requirements • Roll out of metadata, queries, reports, filters, and conditions • Library of shared objects • Security • Incorporation of new users • Distribution of data marts and catalogs During this process, responsibility for the data warehouse may be transferred from information system (IS) staff to the owning organization. ..................................................................................................................................................... Data Warehousing Fundamentals 4-51 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Tasks and Deliverables • • Outlined in Work Breakdown Structure Organized by process and phase Task ID A Task Name Strategy A.RD.EXEC Business Requirements Definition A.RD.001 Obtain Existing Reference Material A.RD.002 Obtain Reference Data Models A.RD.003 Define Strategic Goals, Vision of the Enterprise A.RD.004 Establish Business Initiatives A.RD.005 Define Objectives and Purpose of Enterprise Data Warehouse A.RD.015 Collect Enterprise Business Information Requirements A.RD.034 Document Data Warehouse Subject Areas A.RD.035 Create Data Warehouse Subject Area Data Model A.RD.044 Define Data Warehouse Implementation Roadmap A.RD.045 Prepare Business Case for Enterprise Data Warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-52 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Tasks and Deliverables Tasks are the foundation for the work breakdown structure (WBS). Each task is assigned to a process and phase within an approach. DWM identifies tasks and deliverables that are included in a full life-cycle development project. They are fully outlined in the Work Breakdown Structure and are organized by process and phase. Below you see a sample of tasks as identified in the WBS. Task ID A A.RD.EXEC A.RD.BEG A.RD.001 A.RD.002 A.RD.003 A.RD.004 A.RD.015 A.RD.034 A.RD.044 A.RD.045 Task Name Strategy Business Requirements Definition Begin Strategy Execution Obtain Existing Reference Material Obtain Reference Data Models Define Strategic Goals, Vision, and Initiatives of the Enterprise Define Objectives and Purpose of Enterprise Data Warehouse Collect Enterprise Information Requirements Create Enterprise DW Logical Data Model Define Enterprise DW Implementation Roadmap Prepare Business Case for Enterprise Data Warehouse Deliverable Existing Reference Material Reference Data Models Enterprise Goals, Vision, and Initiatives Enterprise Data Warehouse Statement of Value Enterprise DW Information Requirements Enterprise DW Logical Model Enterprise DW Implementation Roadmap Enterprise DW Business Case ..................................................................................................................................................... Data Warehousing Fundamentals 4-53 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Roles • • The project team: roles and responsibilities Common roles Analyst, database administrator, programmer, tester • Warehouse specific roles DW architect, metadata architect, data quality administrator, DW administrator Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-54 Data Warehousing Fundamentals DWM Fundamental Elements ..................................................................................................................................................... Roles A warehouse project is complex in many ways especially the project team. The DWM identifies the roles required and the main responsibilities of each role. It identifies roles that are common within technology departments, such as: • Development database administrator, who works closely with the system administrator • Lead tester, who oversees the test script planning, development, and execution activities • Production database administrator, who installs and configures the production database and maintains database access controls It identifies roles that are unique to data warehouse projects, for example: • Data warehouse administrator: The data warehouse administrator is responsible for warehouse management, maintenance, and the total data warehouse production environment. • Data warehouse architect: The data warehouse architect establishes the strategic data warehouse architecture and manages the integration of the developed increments with the wider data warehouse architecture. • Data warehouse database designer: The data warehouse database designer is responsible for producing the logical and physical database designs for the data warehouse and data mart and for metadata objects. Within this element of the method, other roles are identified. ..................................................................................................................................................... Data Warehousing Fundamentals 4-55 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Warehouse Technology Initiative • Customer driven – Warehouse products only – Quality, not quantity – High-value partnerships • Requires – Oracle certified solution partner level – Product certification – References Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-56 Data Warehousing Fundamentals Oracle Warehouse Technology Initiative (WTI) ..................................................................................................................................................... Oracle Warehouse Technology Initiative (WTI) A number of leading hardware and software vendors provide warehouse initiatives. They may comprise a solution that uses a single vendor’s products or combines products from multiple vendors. The Oracle Warehouse Technology Initiative (WTI) offers the Oracle database combined with specialized tools from dedicated warehouse providers. The partner company must supply products with specific functionality that supports data warehouses using an Oracle database. Oracle Alliance Program The Oracle Alliance program is a partnership of the world’s leading information technology companies. There are more than 3,000 partners in 93 countries. Through the program, partners and Oracle work together to offer mutually reinforcing products and services that expand markets and lead to greater business success for all. The program includes partners from key segments of the information technology industry, including software developers, hardware vendors, distributors, resellers, consultants, and system integrators. ..................................................................................................................................................... Data Warehousing Fundamentals 4-57 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... M WTI Partners by Categories • • • • • Design and administration Source Manage Access Data content provider Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-58 Data Warehousing Fundamentals Oracle Warehouse Technology Initiative (WTI) ..................................................................................................................................................... WTI Partners by Categories Oracle’s WTI is composed of the following partner categories: design and administration, source, manage, access, and data content providers. Design and Administration Enables you to plan and design a data warehouse from the ground up. These products help you identify and qualify the source data, lay out the data structures, and define the mapping between data sources and the target data warehouse. Source WTI partners in this category to produce tools that help you build and implement the data warehouse. IT professionals use these tools and utilities to extract, transform, cleanse, and move data from source systems into the data warehouse or data marts. Manage This category covers products in every area of warehouse management, including administering the database, managing the warehouse metadata, and managing recurring tasks—any tool or utility that enables you to manage or administer an Oracle7, Oracle8, Oracle8i, or Express-based data warehouse or data mart. Access Enables you to view the contents of your data warehouse or data mart database for analysis. Tools include report writers, query products, OLAP software, executive information systems, and data mining. The products embrace a broad range of architectures—from server-only to client-server to Web-based servers. Data Content Provider This category includes any enterprise that sells or rents data sets suitable for data warehousing. The data can range from market-share information to demographics to financial-time services. ..................................................................................................................................................... Data Warehousing Fundamentals 4-59 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Summary This lesson discussed the following topics: • Explaining the different approaches to warehouse development and the benefits of an incremental approach • • Identifying the purpose of the Oracle Method • Discussing the objectives of the Oracle Warehouse Technology Initiative Discussing the purpose and fundamental elements of Data Warehouse Method Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-60 Data Warehousing Fundamentals Summary ..................................................................................................................................................... Summary This lesson discussed the following topics: • Explaining the different approaches to warehouse development and the benefits of an incremental approach • Identifying the purpose of the Oracle Method • Discussing the purpose and fundamental elements of Data Warehouse Method • Discussing the objectives of the Oracle Warehouse Technology Initiative ..................................................................................................................................................... Data Warehousing Fundamentals 4-61 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Practice 4-1 Overview This practice covers the following topics: • Defining the business requirements of a fictitious beverage company, including the purpose, goals, and strategies of a data warehouse by interviewing executives • Uncovering some of the possible issues and challenges in a data warehouse implementation project through the class discussion Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 4-62 Data Warehousing Fundamentals Practice 4-1 ..................................................................................................................................................... Practice 4-1 Exercise Background Task You and a team of two or three other people are about to embark on Phase I of a data warehouse project, that is determining the business requirements. This task involves interviewing executives in your company to define the purpose, goals, and strategies of the data warehouse. In this exercise, you are going to form small groups and role-play the interviewing session with your teammates. Do the following: • Read through this worksheet. (5 mins) • Form into groups of four and role play the interviewing session with your teammates. Each of you will be assuming a role such as the DW team manager, the chief financial officer (CFO), the chief operating officer (COO), or the information technology (IT) director. Use the interview questions and the background about each character to help you in this exercise. (15 mins) • Regroup and in the class discussion answer the questions. Give your feedback based on your observation. (20 mins) Scenario Krispan Beverages, Inc., produces soft drinks, noncarbonated drinks, mixers and sparkling waters, and distributes them all over the world. The CFO, has been promoting and executing on the concepts of data warehousing for some time. Some of the executives at Krispan seem to think that they are ready to build a data warehouse to better understand their business and help business decision makers to make better decisions. Company Profile Krispan Beverages Inc. is based in California. The company develops, manufactures, markets, and distributes a full line of branded cola and multiflavored soft drinks, juice products and bottled water. Mission Statement “We exist to create value for our share owners on a long-term basis by building a business that enhances Krispan’s trademarks. We do this by maintaining our market leading status developing superior soft drinks, both carbonated and noncarbonated, and profitable nonalcoholic beverage system, financial analysis, and distribution services using empowered team dynamics in a total quality paradigm.” ..................................................................................................................................................... Data Warehousing Fundamentals 4-63 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... Role 1: Data Warehouse Team Manager He is the data warehouse team manager for the Data Warehousing Implementation team. He is going to interview the following key people using the interview questions on the next page. • The chief financial officer (CFO) who is also the board-appointed project sponsor this data warehouse implementation project. • The chief operating officer (COO) • The IT director (IT Director) Role 2: CFO He was the board-appointed project sponsor and the person who has been gaining a lot of profits from the company’s success. He does not want the new systems because they will require a lot of change within his group. He is conservative in his thinking and wants things to go on as before. He supports the company’s mission statement but only so far as it meets his own agenda. Role 3: COO She wants the system because she realizes the power of information, believes that the data warehouse will give her real control in the company, and acknowledges that the data warehouse will enable the company to be more competitive in the marketplace. The COO has a good high-level understanding of what she wants the system to provide her but she will need significant help in sorting out the details. She understands the vision for the business and fully supports it. Role 4: IT Director She does not understand the vision of the business but pretends that she does by quoting it on a regular basis. She is very technical savvy but lacks the business understanding of the organization. She wants power and influence, and believes she can get both of these through the new infrastructure and big systems that are planned. ..................................................................................................................................................... 4-64 Data Warehousing Fundamentals Practice 4-1 ..................................................................................................................................................... Interview Questions Ask the key persons the following questions. Question to Ask 1 What is the business vision? CFO COO IT Director 2 Why does the company need an enterprise data warehouse? 3 What do you expect the data warehouse to provide or what will you get out of the warehouse? 4 How soon do you need to have data loaded into the data warehouse and how up-to-date does the data need to be? Class Discussions 1 Identify the major challenges for a data warehousing implementation project, as shown in this exercise. 2 Give your suggestions on how to overcome these challenges. 3 If you apply the Oracle Data Warehouse Method in the implementation to this project, how would apply it and where do you see the benefits from using this method? ..................................................................................................................................................... Data Warehousing Fundamentals 4-65 Lesson 4: Driving Implementation Through a Methodology ..................................................................................................................................................... ..................................................................................................................................................... 4-66 Data Warehousing Fundamentals 5 ................................. Planning for a Successful Warehouse Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Overview Defining DW Concepts & Terminology Planning Planning for for aa Successful Successful Warehouse Warehouse Meeting a Business Need Choosing a Computing Architecture Planning Warehouse Storage Modeling the Data Warehouse ETT (Building the Warehouse) Analyzing User Query Needs Supporting End User Access Managing the Data Warehouse Project Management (Methodology, Maintaining Metadata) Copyright  Oracle Corporation, 1999. All rights reserved. Objectives After completing this lesson, you should be able to do the following: • • Explain the financial issues that must be managed • Outline the key tasks involved in managing a warehouse project • Identify the major warehouse planning phases and their deliverables • • List warehouse strategy phase deliverables Outline techniques for obtaining business commitment to the warehouse List warehouse scope phase deliverables Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-2 Data Warehousing Fundamentals Overview ..................................................................................................................................................... Overview The previous lesson introduced the importance of driving a warehouse project by a methodology. This lesson introduces the planning that is critical to the success of a data warehouse project. Planning phases, deliverables, and project roles are identified. Overall warehouse strategy and project scope are defined. Note that the “Planning for a Successful Warehouse” block is highlighted in the overview slide on the facing page. Objectives After completing this lesson, you should be able to do the following: • Explain the financial issues that must be managed in developing and implementing a data warehouse. • Outline techniques for obtaining business commitment to the warehouse. • Outline the key tasks involved in managing a warehouse project • Identify the major warehouse planning phases and their deliverables • List warehouse strategy phase deliverables • List warehouse scope phase deliverables ..................................................................................................................................................... Data Warehousing Fundamentals 5-3 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Financial Justification • Intangible Benefits (45%) – Remain competitive – Respond to changing business conditions – Support reorganization • Better Data and Better Decision Making (25%) – Reduce IS costs – Better response time – Rigorous reporting • Productivity or ROI (30%) – For internal users – For external users Database Associates, “Data Warehouse in Practice,” June 1993 Copyright  Oracle Corporation, 1999. All rights reserved. ROI and Associated Costs • • Build a strong case – Costs – ROI – Profitability – Efficiency – Objectives Consider – Impact of time for ETT – Additional storage requirements – Cost of redundant data – Cost of database, software licenses, labor Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-4 Data Warehousing Fundamentals Managing Financial Issues ..................................................................................................................................................... Managing Financial Issues Financial Justification The project is a big investment in resources and finances. Management must be able to report on how the data warehouse benefits the business. Justification is divided into three main areas: • The intangible benefits (45%) are that the business can remain competitive, respond to changing business conditions, and support reorganization. • Better data and decision making (25%) reduce information technology costs, provide better response times, and provide rigorous reporting. • Productivity or Return on Investment (ROI) (30%) benefit internal and external users. Return on Investment The financial justification must set out a strong case that clearly establishes measurements such as cost versus return on investment, and increased efficiency and profit. It must also set clearly defined objectives that can be monitored and measured. Associated Costs Along with cost justification, you should provide a plan that specifies other factors that will impact the cost of the project and other aspects of the business. • • The cost of developing ETT or purchasing the ETT tools The actual time required for data cleansing, transformation, and extraction, which may impact day-to-day operations • Storage requirements for extract, summarization, work space, log space, backup, recovery, and maintenance • The cost of redundant data • Hardware and software costs • The cost of server and system software licenses • Labor costs You may regard this as a negative approach because some of these issues have a bad impact on the business. However, given the enormous size of a data warehouse project, every issue, good or bad, must be clearly understood and appreciated. ..................................................................................................................................................... Data Warehousing Fundamentals 5-5 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Funding the Project • • State that initial system integration costs are high. Determine who funds the project: – Information systems—development group – Department—users Information systems Department Selected subject for pilot Small staff Short duration Department Department More subjects funded by end-user organizations Copyright  Oracle Corporation, 1999. All rights reserved. Charging Back Costs • • Some warehouses do not charge initially. Benefits: – Encourages efficient use – Provides shared costs • Drawbacks: – Users cannot dwell on detail. – Users try to reduce costs. – Machine resources are taken up monitoring use. Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-6 Data Warehousing Fundamentals Managing Financial Issues ..................................................................................................................................................... Funding Initially, the information technology group may fund the project up until the pilot run of the first increment. After the pilot, when the process is proven, funding usually passes to the individual departments, particularly if the implementation is a departmentalized data mart. Debates often arise between information systems and individual departments about who should pay for resources, such as the hardware and software, system (warehouse) monitoring tools, and OLAP tools. Individual departments often express concern that, if they fund tools in the development of one of the first subject areas that will be used for warehouse initiatives, they should be able to recoup part of the investment from other departments who build subject areas and benefit from those tools at a later time. If the information systems department funds the tools, they absorb the cost or can bill back to individual departments as required, over the depreciation life of the tools. In the case of specific data marts (for departments), the cost is often the responsibility of the local department. Some warehouses do not charge for the first few months, usually while the project is being funded by information systems development groups. Once the warehouse is piloted and has proved successful, then charges are normally levied. Charge Models There are different models that you may use; none of them are completely fair. There are no chargeback models strictly for the warehouse environment and the best model may be a hybrid, specifically developed in house for the purpose. Chargeback Benefits • Encourages efficient and sensible use of resources • Promotes realistic ongoing additional requirement requests • Allows users to share the cost for the data warehouse processing and maintenance Chargeback Drawbacks • Users cannot dwell on detail, knowing they are being charged for the service. • Users may not be motivated to discover more, anticipating that costs may run too high. • Machine resources are needed to monitor and maintain a charging system. The business value of tangible, measurable results, in most cases, far outweighs the overhead costs. Even if chargeback strategies are not deployed, the information systems team still need to monitor warehouse use and can use those metrics to justify future direction. ..................................................................................................................................................... Data Warehousing Fundamentals 5-7 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Obtaining Business Commitment • Ensure that the warehouse: – Has total support – Is driven by the business • • • • • Research the problem Identify goals, visions, priorities Research the solution Identify the benefits Identify the constraints Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-8 Data Warehousing Fundamentals Obtaining Business Commitment ..................................................................................................................................................... Obtaining Business Commitment A data warehouse implementation requires the total support of those who control the business and make the decisions that drive the business forward. The warehouse is a business-driven project, not an information technology drive for the latest hardware, software, tools, and techniques. Business objectives must be clear, well defined, measurable, and achievable: • Research and study the business problem; identify the business vision, goals, and priorities • Research the solution and define what the warehouse solution may do • Identify the benefits of the solution, such as efficiency, people power, customer satisfaction, and returns • Identify the constraints, such as schedule, costs, and experience Note: Obtaining business commitment is supported by the Business Requirements Definition of the DWM Strategy Phase. ..................................................................................................................................................... Data Warehousing Fundamentals 5-9 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Data Warehouse Champion • • • • • • • • Maintains intergroup communication Settles conflicts Identifies and solves issues Articulates the vision Brings in business expertise Organizes and supports the team Communicates progress Brings the data warehouse to life Copyright  Oracle Corporation, 1999. All rights reserved. Steering Committee • Business executives • Information systems representatives • Knowledge workers • • • • • Provides direction Decides upon implementation issues Sets priorities Assists with resource allocation Communicates to all levels at all times Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-10 Data Warehousing Fundamentals Obtaining Business Commitment ..................................................................................................................................................... Data Warehouse Champion There must be someone within the organization who remains focused and works to: • Ensure all groups within the development team communicate. • Settle conflicts between groups. • Identify and solve issues or problems at any level. • Articulate the vision and wisdom of the warehouse to everyone involved in developing and using the warehouse. • Bring business expertise to the task. • Organize and support the team. • Communicate progress, processes, and achievements throughout the organization. • Bring the data warehouse to life. Steering Committee The steering committee should comprise representatives of different sectors within the business: • Business executives • Information systems representatives • Users The aim of the committee is to: • Provide business direction. • Decide upon enterprisewide implementation issues. • Determine and set development priorities. • Assist with resource allocation. • Communicate consistently to all areas and levels of the organization. Each subject area may have its own detailed project plan, which can be rolled up to a master plan weekly or monthly. The steering committee must be aware of how changes to business direction and priorities affect existing project plans, milestones, and deliverables. They must approach the renegotiation of existing plans tactfully and diplomatically. Note: The steering committee is not a substitute for the project manager. ..................................................................................................................................................... Data Warehousing Fundamentals 5-11 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Warehouse Data Ownership • • • Users must own the data Users must be involved throughout Users must be part of the steering committee: – Enhances cooperation – Reduces friction – Helps meet requirements – Enhances feedback Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-12 Data Warehousing Fundamentals Obtaining Business Commitment ..................................................................................................................................................... Warehouse Data Ownership It is important that users feel they own the warehouse and the data contained within it. If they have a vested interest in the project, they are eager for more information and have an interest in the future use and maintenance of the content. You should involve users throughout the project, making them part of the steering committee. Involving the users in this way leads to: • Enhanced cooperation between different departments in the business • Reduced friction among groups or departments, with problem resolution and formal project and change management • Meeting business requirements • Continuous and useful feedback ..................................................................................................................................................... Data Warehousing Fundamentals 5-13 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Managing a Warehouse Project • Determine organizational readiness for the warehouse • Adopt an incremental approach to warehouse development • • • • • Set expectations Manage expectations Assemble the project team Estimate the data warehouse project Recognize critical success factors Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-14 Data Warehousing Fundamentals Managing a Warehouse Project ..................................................................................................................................................... Managing a Warehouse Project Managing a warehouse project involves seven broad categories of tasks: • Determining organizational readiness for the warehouse • Adopting an incremental approach to warehouse development • Setting expectations • Managing expectations • Assembling the project team • Estimating the data warehouse project • Identifying critical success factors These tasks are described on the following pages. ..................................................................................................................................................... Data Warehousing Fundamentals 5-15 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Determining Organizational Readiness for the Warehouse 1. Are the objectives and business drivers clearly defined, compelling, and agreed upon? 2. Have you selected a methodology for design, development, and implementation? 3. Is the project scope clearly defined, with a focus on business rather than technology? 4. Is there strong support from a business management sponsor? 5. Does the business management sponsor have specific expectations? Copyright  Oracle Corporation, 1999. All rights reserved. Determining Organizational Readiness for the Warehouse 6. Are there cooperative relations between business and Information Systems staff? 7. Have you identified which source data will be used to populate the data warehouse? 8. What is the quality and “cleanliness” of the source data? 9. Are you authorized to choose and acquire hardware and software to implement the warehouse? 10. Are you prepared to select and train your implementation team? Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-16 Data Warehousing Fundamentals Managing a Warehouse Project ..................................................................................................................................................... Determining Organizational Readiness for the Warehouse Before you commit time, money, staff, and other resources to your data warehouse project, it is essential that you assess the readiness of your organization for the warehouse. There are several good readiness checklists available in data warehousing textbooks. Here is a representative list of essential indicators that test an organization’s readiness. If your organization is significantly unprepared in light of these indicators, experience shows that the lack of readiness does not correct itself once the warehouse project starts. If your organization is not ready for or committed to the warehouse, it is best to delay the project rather than to start it and hope to catch up. ..................................................................................................................................................... Data Warehousing Fundamentals 5-17 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Setting Expectations Incremental Scope Rollout over time Phases Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-18 Data Warehousing Fundamentals Managing a Warehouse Project ..................................................................................................................................................... Setting Expectations Expectations for each data warehouse project phase should be established early on. Every organization has heard something about data warehousing, data marts, data mining, and on and on. To set the expectations throughout the organization you first need to determine what each member of the organization is expecting from the data warehouse. Set Expectations for the Incremental Approach Educate all members of the organization in advance that the data warehouse project will be incrementally developed. Explain that there is no formal implementation of the entire data warehouse all at once. Help the user community to understand that the data warehouse provides views of the business over time and under continually changing strategic environments. ..................................................................................................................................................... Data Warehousing Fundamentals 5-19 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Managing Expectations • • • Documenting Informing sponsors Reporting progress to end users Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-20 Data Warehousing Fundamentals Managing a Warehouse Project ..................................................................................................................................................... Managing Expectations Documenting Deliverables Managing expectations during the data warehouse project management cycle can be completed by documenting the deliverables that were completed within each phase. Keeping Sponsors Informed Keep the executive sponsor of the warehouse, as well as the end-user community, abreast of the iterative development that is taking place during each phase. Reporting Incremental Progress to End Users Highlight all new progress and functionality to inform the user community of the incremental advances that are being made to increase the amount of information that can be gained from the data warehouse. ..................................................................................................................................................... Data Warehousing Fundamentals 5-21 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Assembling the Project Team • • • • • Project manager/Project leader Architect Executive sponsor Data analyst Database or system administrator Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-22 Data Warehousing Fundamentals Managing a Warehouse Project ..................................................................................................................................................... Assembling the Project Team During the life cycle of a data warehouse project, you will need to call on staff from both the business side and Information Systems sides of your organization. Often, project roles will be shared and switched over the project life cycle. Project Manager/Project Leader • Manages and defines the data warehouse project plan • Is responsible for the overall design and function of the data warehouse • Coordinates project resources, controls the budget, documents project status, resolves issues, coordinates vendor activity, manages change control On large data warehouse projects, the project manager and project leader are typically two different individuals. Architect • Designs and documents data warehouse architecture and technical infrastructure • On a small data warehouse project, may also be responsible for integrating all networking products and host connectivity Executive Sponsor • Provides clout; influences resource availability, funding, and scheduling • Provides understanding of the organization and its business Data Analyst • Is responsible for the data model and schema design • Manages data quality, data integration, aggregation, and updates • On a small data warehouse project, may also be involved in data extraction and transformation • On a large data warehouse project, may also be involved in exploring end-user data requirements and deploying business intelligence and analysis tools Database or System Administrator • Is responsible for physical database implementation • Installs all hardware and software products for the data warehouse environment • Manages database installation, configuration, security, and administration • May also be involved in helping programmers with data extraction, transformation, loading, backup, and archiving ..................................................................................................................................................... Data Warehousing Fundamentals 5-23 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Estimating the Data Warehouse Project Bottom-Up Project Estimate Percentage of Project Effort A B C D E F Requirements definition 3.2 .25 .79 Data acquisition .74 .23 1.36 6.69 6.26 .85 Total 4.1% 16.1% Architecture Data quality 1 .59 .84 2.22 5.28 9.9% .2 .32 .39 3.22 4.3% Administration .3 .12 .23 4.51 5.84 .2 11.0% . . . Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-24 Data Warehousing Fundamentals Managing a Warehouse Project ..................................................................................................................................................... Estimating the Data Warehouse Project “How much will it cost?” and “When will it be ready?” are typically the first questions asked at the start of a data warehouse project. The most reliable approach to estimating, which can provide answers to these questions, is to calculate a bottom-up project estimate. Bottom-Up Project Estimate A bottom-up estimate can be developed from a work breakdown structure that contains all the tasks to be performed, with project roles mapped to tasks, and defined roles percentages for task participation. The tasks and role mapping provide the infrastructure for documenting the estimating factors that influence each task. Estimating factors can then be used in an estimating formula for each task. The Percentage of Project Effort table depicted on the slide summarizes a bottom-up estimating model. Each cell represents the percent of project effort in that phase of the process. Phase columns sum down to phase totals, and process rows sum across to process totals. Following is a key to the table. Column Label A B C D E F Meaning Strategy Definition Analysis Design Build Transition ..................................................................................................................................................... Data Warehousing Fundamentals 5-25 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Recognizing Critical Success Factors • • • Focus on the business, not the technology Use an iterative development methodology Include end users on the project team Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-26 Data Warehousing Fundamentals Managing a Warehouse Project ..................................................................................................................................................... Recognizing Critical Success Factors Each data warehouse project management phase has critical success factors. The critical success factors for the overall data warehouse project typically include these three items: • Design the data warehouse with a focus on the business, not the technology. In a successfully managed data warehouse project there are no technical decisions, only business decisions. • Use an iterative development methodology. Include short phases that provide frequent deliverables to help manage expectations throughout the project. • Include end users on the project team. End user input is necessary for design decisions that enable the data warehouse project to meet the business goals. ..................................................................................................................................................... Data Warehousing Fundamentals 5-27 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Identifying Planning Phases Strategy Scope Analysis Design Build Production Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-28 Data Warehousing Fundamentals Identifying Planning Phases ..................................................................................................................................................... Identifying Planning Phases Effective and efficient data warehouse project management involves the use of project phases. Project phases identify the tasks to be completed, the resources required, the directing and reporting efforts, and the quality assurance required before moving onto the next phase. Project phasing is a management technique used to focus project teams toward a short-term goal and to communicate progress to senior management. Phase Strategy Scope Analysis Design Build Production Goal Clearly define the business objectives and purpose of the data warehouse solution, while establishing an environment for incremental development. The strategy phase provides the enterprise vision for the data warehouse. Clearly define the scope and objectives for the incremental development effort while complying with strategy. Initial models are created, data sources are documented, and the scope of data quality is defined. The technical architecture and data warehouse architecture are also created for the scoped solution. Formulate the detailed requirements for the data acquisition, and the data access requirements for business analysis and decision making. Take the requirements from the analysis phase and translate them into detailed design specifications, while accounting for the technical architecture, data warehouse architecture, and available technology. Create and test the database structures, data acquisition modules, warehouse administration modules, metadata modules, data access modules, and reports and queries. Install the incremental solution, prepare the client personnel to use and manage the solution, go to production, and begin managing the growth and maintenance of the warehouse. ..................................................................................................................................................... Data Warehousing Fundamentals 5-29 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Strategy Phase Deliverables Phases Strategy Scope Analysis The Strategy Phase Business goals and objectives Data warehouse purpose, objectives, and scope Data warehouse logical model Incremental milestones Design Build Production Source system data flows Subject area gap analysis Data acquisition strategy Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-30 Data Warehousing Fundamentals Identifying Warehouse Strategy Phase Deliverables ..................................................................................................................................................... Identifying Warehouse Strategy Phase Deliverables For each of the data warehouse project phases there are deliverables. The deliverables for the strategy phase focus on defining the business objectives and purpose of the data warehouse solution. The purpose and objectives for the total data warehouse solution are essential to setting and managing expectations. The strategy phase also clearly defines the data warehouse team and the executive sponsor. Strategy Deliverable Business goals and objectives Data warehouse purpose, objectives, and scope Enterprise data warehouse logical model Incremental milestones Source system data flows Subject area gap analysis Data acquisition strategy Description Documents the strategic business goals and objectives Documents the purpose and objectives of the enterprise data warehouse, its scope, and how it is intended to be used High-level, logical information model that diagrams the major entities and relationships for the enterprise Documents a realistic scope of the data warehouse, acceptable delivery milestones for each increment, and source data availability Outlines source system data, where it originates, the flow of data between business functions and source systems, degree of reliability, and data volatility Documents the variance between the information requirements and the ability of the data sources to provide the information Documents the approach for extracting, transforming, transporting, and loading data from the source systems to the target environments for the initial load and subsequent refreshes ..................................................................................................................................................... Data Warehousing Fundamentals 5-31 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Strategy Phase Deliverables Phases Strategy Scope Analysis The Strategy Phase Data warehouse architecture Technical infrastructure Data access environment Data quality strategy Design Build Production Data warehouse administration strategy Metadata strategy Training strategy Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-32 Data Warehousing Fundamentals Identifying Warehouse Strategy Phase Deliverables ..................................................................................................................................................... Identifying Warehouse Strategy Phase Deliverables (continued) Strategy Deliverable Data warehouse architecture Technical infrastructure Data access environment Data quality strategy Data warehouse administration strategy Metadata strategy Training strategy Description Documents the set of rules or structures providing the framework for the centralized data warehouse, data marts, metadata repository, fact tables, multidimensional structures, and data access components Outlines the technologies, platforms, databases, gateways, and other components necessary to make the architecture functional Documents the identification, selection, and design of tools that support end-user access to the warehouse data Outlines the approach for data management, error and exception handling, data cleansing, and the audit and control of the data Documents the warehouse administration tasks and considerations such as version control, archive, backup, and analysis of metadata and query profiles for optimization Documents the strategy for capturing, integrating, and accessing metadata for all components of the warehouse environment Outlines the development and end-user training requirements, identifies the technical and business personnel requiring training, and establishes time frames for executing the training plans ..................................................................................................................................................... Data Warehousing Fundamentals 5-33 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Defining the Warehouse Project Scope Phases Strategy • Focus on the business, not the technology Scope • Break down the project into manageable phases Analysis • Encourage rapid turnaround on deliverables Design • Always include the end users on the team Build Production Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-34 Data Warehousing Fundamentals Identifying Project Scope Phase Deliverables ..................................................................................................................................................... Identifying Project Scope Phase Deliverables Defining the Warehouse Project Scope Without a complete understanding of the business objectives and scope of the overall warehouse, project staff will not be able to proceed successfully. Focus on the Business, Not the Technology Iterative development requires discipline in scoping deliverables. A clear business focus, rather than technology considerations, should drive scope. A realistic scope that produces deliverables in short time frames helps ensure success and continued management commitment to the data warehouse implementation. Break the Project Down into Manageable Phases One challenge in defining manageable phases is dealing with numerous tasks coupled to numerous interdependencies, all occurring within a short time frame. Breaking this complexity down into manageable pieces works toward the success of the project. Define Deliverables As each phase is broken down into a collection of processes, define the expected deliverables for each task. Involve End Users Iterative development works only when users are active participants on the delivery team. In a data warehouse project there should be no technical decisions, only business decisions. Business requirements drive all technical decisions. ..................................................................................................................................................... Data Warehousing Fundamentals 5-35 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Scope Phase Deliverables Phases The Scope Phase Strategy Business requirements definition Scope Analysis Data sources Load and refresh plans Technical architecture Design Data warehouse architecture Build Production Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-36 Data Warehousing Fundamentals Identifying Project Scope Phase Deliverables ..................................................................................................................................................... Defining the Warehouse Project Scope (continued) The deliverables for the scope phase focus on clearly defining the scope and objectives for the incremental development effort. Initial models are created, data sources are documented, and the scope of data quality is clearly defined. The technical architecture and data warehouse architecture are also created. Scope Deliverable Business requirements definition Data sources Load and refresh plans Technical architecture Data warehouse architecture Description Documents the objectives and defines the development efforts for the business requirement task (The scope clearly outlines the requirements, functionality, expected benefits, and costs of the solution. Success criteria and business constraints are also documented.) Outlines the operational and external data source systems, hardware and software platforms, types of data in the system, frequency and source of updates Documents how extraction, transformation, and transportation will be performed Documents capacity planning, interface requirements, hardware architecture, software, tools, and configuration requirements Outlines the database objects, data access components, and metadata repository ..................................................................................................................................................... Data Warehousing Fundamentals 5-37 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Scope Phase Deliverables Phases Strategy Scope Analysis The Scope Phase Data quality Warehouse administration plan Metadata integration plan Data access plan Design Training plan Build Production Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-38 Data Warehousing Fundamentals Identifying Project Scope Phase Deliverables ..................................................................................................................................................... Defining the Warehouse Project Scope (continued) Scope Deliverable Data quality Warehouse administration plan Metadata integration plan Data access plan Training plan Description Documents the plan for data cleansing and scrubbing, error and exception handling, auditing, and feeding back corrected data to source systems Documents the tasks, resources, and time frames for producing the warehouse administration functionality Outlines the tasks, resources, and time frames needed to ensure the metadata is integrated with the data warehouse components Documents the data access tasks for implementing an existing tool or developing a system to provide access capabilities Outlines the training needed to support the tasks of the current phase ..................................................................................................................................................... Data Warehousing Fundamentals 5-39 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Summary This lesson discussed the following topics: • Cultivating management support, both financial and political, for the warehouse • Developing a realistic scope that produces deliverables in short time frames to help ensure success • Assessing your organization’s readiness for a data warehouse • Setting realistic expectations Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-40 Data Warehousing Fundamentals Summary ..................................................................................................................................................... Summary This lesson discussed the following topics: • Cultivating management support, both financial and political, for the warehouse • Developing a realistic scope that produces deliverables in short time frames to help ensure success • Assessing your organization’s readiness for a data warehouse • Setting realistic expectations ..................................................................................................................................................... Data Warehousing Fundamentals 5-41 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Practice 5-1 Overview This practice covers the following topics: • Generating a warehouse organizational readiness checklist • Generating a warehouse strategy deliverables checklist • Generating a warehouse project scope deliverables checklist Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 5-42 Data Warehousing Fundamentals Practice 5-1 ..................................................................................................................................................... Practice 5-1 Warehouse Organizational Readiness Checklist 1 For each item in the following list that measures warehouse readiness, rate your own organization’s readiness. Rate each item’s relative importance in measuring your organization’s readiness. Readiness Measure Are the objectives and business drivers clearly defined, compelling, and agreed upon? Have you selected a methodology for design, development, and implementation? Is the project scope clearly defined, with a focus on business rather than technology? Is there strong support from a business management sponsor? Does the business management sponsor have specific expectations? Are there cooperative relations between business and Information Systems staff? Have you identified which source data will be used to populate the data warehouse? What is the quality and “cleanliness” of the source data? Are you authorized to choose and acquire hardware and software to implement the warehouse? Are you prepared to select and train your implementation team? Your Organization’s Readiness ..................................................................................................................................................... Data Warehousing Fundamentals 5-43 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Warehouse Strategy Deliverables Checklist 2 Form into small groups, and consider each of the following strategy deliverables. For each deliverable, discuss briefly whether you would use it in your own strategy checklist back at your workplace, and rate its importance relative to the other deliverables. Strategy Deliverable Business goals and objectives Data warehouse purpose, objectives, and scope Enterprise data warehouse logical model Incremental milestones Source system data flows Subject area gap analysis Data acquisition strategy Description Documents the strategic business goals and objectives Documents the purpose and objectives of the enterprise data warehouse, its scope, and how it is intended to be used. High-level, logical information model that diagrams the major entities and relationships for the enterprise Documents a realistic scope of the data warehouse, acceptable delivery milestones for each increment, and source data availability Outlines source system data, where it originates, the flow of data between business functions and source systems, degree of reliability, and data volatility Documents the variance between the information requirements and the ability of the data sources to provide the information Documents the approach for extracting, transforming, transporting, and loading data from the source systems to the target environments for the initial load and subsequent refreshes Will You Use? Why? ..................................................................................................................................................... 5-44 Data Warehousing Fundamentals Practice 5-1 ..................................................................................................................................................... Warehouse Strategy Deliverables Checklist (continued) Strategy Deliverable Data warehouse architecture Technical infrastructure Data access environment Data quality strategy Data warehouse administration strategy Metadata strategy Training strategy Description Documents the set of rules or structures providing the framework for the centralized data warehouse, data marts, metadata repository, fact tables, multidimensional structures, and data access components Outlines the technologies, platforms, databases, gateways, and other components necessary to make the architecture functional Documents the identification, selection, and design of tools that support end-user access to the warehouse data Outlines the approach for data management, error and exception handling, data cleansing, and the audit and control of the data Documents the warehouse administration tasks and considerations such as version control, archive, backup, and analysis of metadata and query profiles for optimization Documents the strategy for capturing, integrating, and accessing metadata for all components of the warehouse environment Outlines the technical and business personnel requiring training, and establishes time frames for executing the training plans Will You Use? Why? ..................................................................................................................................................... Data Warehousing Fundamentals 5-45 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... Warehouse Project Scope Deliverables Checklist 3 Staying in your small group, discuss each of the following project scope deliverables. For each deliverable, discuss briefly whether you would use it in your own project scoping checklist back at your workplace, and rate its importance relative to the other deliverables. Scope Deliverable Business requirements definition Data sources Load and refresh plans Technical architecture Data warehouse architecture Description Documents the objectives and defines the development efforts for the business requirement task (The scope clearly outlines the requirements, functionality, expected benefits and costs of the solution. Success criteria and business constraints are also documented.) Outlines the operational and external data source systems, hardware and software platforms, types of data in the system, frequency and source of updates Documents how extraction, transformation, and transportation will be performed Documents capacity planning, interface requirements, hardware architecture, software, tools and configuration requirements Outlines the database objects, data access components, and metadata repository Will You Use? Why? ..................................................................................................................................................... 5-46 Data Warehousing Fundamentals Practice 5-1 ..................................................................................................................................................... Warehouse Project Scope Deliverables Checklist (continued) Scope Deliverable Data quality Warehouse administration plan Metadata integration plans Data access plan Training plan Description Documents the plan for data cleansing and scrubbing, error and exception handling, auditing, and feeding back corrected data to source systems Documents the tasks, resources, and time frames for producing the warehouse administration functionality Outlines the tasks, resources, and timeframes needed to ensure that the metadata is integrated with the data warehouse components Documents the data access tasks for implementing an existing tool or developing a system to provide access capabilities Outlines the training needed to support the tasks of the current phase Will You Use? Why? ..................................................................................................................................................... Data Warehousing Fundamentals 5-47 Lesson 5: Planning for a Successful Warehouse ..................................................................................................................................................... ..................................................................................................................................................... 5-48 Data Warehousing Fundamentals 6 ................................. Analyzing User Query Needs Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Overview Defining DW Concepts & Terminology Planning for a Successful Warehouse Meeting a Business Need Choosing a Computing Architecture Planning Warehouse Storage Modeling the Data Warehouse ETT (Building the Warehouse) Analyzing Analyzing User User Query Query Needs Needs Managing the Data Warehouse Supporting End User Access Project Management (Methodology, Maintaining Metadata) Copyright  Oracle Corporation, 1999. All rights reserved. ® Objectives After completing this lesson, you should be able to do the following: • • • • • Identify the warehouse users Identify how to gather user requirements Identify tasks involved with managing query access Identify the different database models that support OLAP query tools Describe query access architectures Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-2 Data Warehousing Fundamentals Overview ..................................................................................................................................................... Overview The previous lesson covered planning for a successful warehouse. This lesson discusses analyzing user query needs. Note that the “Analyzing User Query Needs” block is highlighted in the course road map on the facing page. Specifically, this lesson identifies the analysis required to identify and categorize users who may need to access data from the warehouse. This lesson also helps you determine how their requirements differ. Data access and reporting tools are considered. Objectives After completing this lesson, you should be able to do the following: • Identify the warehouse users • Identify how to gather user requirements • Identify tasks involved with managing query access • Identify the different database models that support OLAP query tools • Describe query access architectures ..................................................................................................................................................... Data Warehousing Fundamentals 6-3 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Types of Users • • • Executives Managers Business analysts Copyright  Oracle Corporation, 1999. All rights reserved. ® User Access Types of Users • • Executives • Business analysts or power users Casual users or managers Structured Unstructured Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-4 Data Warehousing Fundamentals Types of Users ..................................................................................................................................................... Types of Users In any warehouse environment, the user communities and their query requirements vary according to their roles and responsibilities. Types of Users Executives Definition They are in charge of the business and have overall responsibility for controlling the business at an enterprise level, determining profitability, competitiveness, and strategy. They need to see bottom-line figures. Casual users or managers They are in charge of a smaller component of the business and need the information to control the profitability, direction, planning, and control of a smaller subset of the business. They also need to see the enterprisewide picture in order to fit localized plans into the corporate goal. Business analysts or power users They have a solid understanding of the business process and also have a technical understanding of dimensional modeling and SQL, which are required to extract the answers to business questions from the data warehouse and produce the reports needed by the managers and executives. They often function as a liaison between business and technical groups. Requirements • They may interface to the warehouse only through printed reports although these users will experience the power of the data warehouse as the reports become more accurate, consistent, and easier to produce. • Their needs drive the development of the applications, the architecture of the warehouse, the data it contains, and the priorities for implementation. • They need easy-to-use tool that helps them specify what they want to see and determine how to produce the desired results on its own. • The tool must allow construction of all the reporting elements without being too complicated. • A single interface and invisible multipass SQL are critical. • They need a tool that reflects the way they would break down and solve the business problem. • The tool should handle reporting elements such as ranking and comparison across summary levels. ..................................................................................................................................................... Data Warehousing Fundamentals 6-5 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Gathering User Requirements • • Areas to focus: • What attributes users need (required versus good to have) • • • • • What the business hierarchies are How users do business and what the business drivers are What data users use and what they like to have What levels of detail or summary needed What type of front-end data access tool used How users expect to see the query results Copyright  Oracle Corporation, 1999. All rights reserved. ® Gathering User Requirements: Possible Obstacles The following are some of the possible obstacles: • Business objective of the data warehouse has not been specifically defined • • Scope of the data warehouse is too broad Misunderstanding about the purpose and function of a decision support systems and operational systems Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-6 Data Warehousing Fundamentals Gathering User Requirements ..................................................................................................................................................... Gathering User Requirements You must approach data warehouse end-user requirements gathering in a radically different way than with operational systems. The following are the areas to focus in gathering user requirements. • How users do business • What the business drivers are • What attributes users need • Which attributes are absolutely required and which attributes are good to have • What the business hierarchies are • What data users use now and what they would like to have • What levels of detail or summary the users need • What type of front-end data access tool will be used • How the users expect to see the results of their queries The following are some of the possible obstacles to gathering user requirements. • The business objective of the data warehouse has not been specifically defined • The scope of the data warehouse is too broad • There is a misunderstanding about the purpose and function of a decision support systems and operational systems ..................................................................................................................................................... Data Warehousing Fundamentals 6-7 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Data Access Tool Requirements • • • • • • • • • • Simple reports Complex trend analysis Regression analysis Multidimensional data analysis Exceptions reporting Forecasting Data manipulation Data mining Parameterized reports for batch execution Web-based or client-server-based (or both) Copyright  Oracle Corporation, 1999. All rights reserved. ® Data Access Strategy • • • Define user requirements early Determine the choice of tools early Identify user roles and access requirements Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-8 Data Warehousing Fundamentals Managing User Data Access ..................................................................................................................................................... Managing User Data Access Data Access Tool Requirements The front-end tools must be able to associate common business terms used on a dayto-day basis, with a combination of clear and easy-to-understand data definitions. This enables the users to use the product quickly, without the need for extensive training. Metadata provides definitions of the data that the user can understand, in simple, straightforward business terminology. The tool must be flexible, to provide different reporting requirements such as: • Simple reports • Complex trend analysis • Regression analysis • Multidimensional data analysis • Exceptions reporting • Forecasting • Data manipulation • Data mining • Parameterized reports for batch execution • Web-based or client-server based (or both) Data Access Strategy Given the importance to warehouse users of the data and accessing that data, the choice of tools employed by users is primary and must be defined and determined early in the definition of the data warehouse. ..................................................................................................................................................... Data Warehousing Fundamentals 6-9 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... User Query Progression • • • Starts simple Becomes more analytical Requires different techniques and flexible tools What? Why? Why? Why? Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-10 Data Warehousing Fundamentals Managing User Data Access ..................................................................................................................................................... User Query Progression The tools that you employ must provide the flexibility to answer a user’s immediate and future needs. The answer to a question may not be immediately obvious, and one question can often lead to another. Querying the warehouse is an iterative process. For example, a user may start with a query that answers reasonably simple questions, such as: “What are the sales figures for Sprock tennis rackets during the first half of 1999 in the U.S.A. as a whole?” Once the query is answered, the user may start to ask more analytical questions, such as: “Why did the sales figures for Sprock tennis rackets in the U.S.A. increase during that period?” The answer proves to be that the World Tennis Championships ran in Miami in March 1999. Obviously, tennis caught everyone’s attention. Now that the answer to that is known, the process continues: “Which U.S. state sold most Sprock tennis rackets? Why?” To answer these types of question, the user needs to be able to analyze data in a number of different ways. ..................................................................................................................................................... Data Warehousing Fundamentals 6-11 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Training • • Methods – Informal: one-to-one or small class – Formal: larger class – Self-study Basic topics – Logging on – Accessing metadata – Creating and submitting a query – Interpreting results – Saving queries and storing results – Utilizing resources – Learning warehouse fundamentals Copyright  Oracle Corporation, 1999. All rights reserved. ILT IDL CBT ® ..................................................................................................................................................... 6-12 Data Warehousing Fundamentals Managing User Data Access ..................................................................................................................................................... Training the Users Training Methods Users must be trained in using the system you have put in place. There are a number of ways of teaching. The common methods are: • Informal sessions with a small number of users who can disseminate the information after the class (Typically the sessions are on a one-to-one basis, as there are few real users of the warehouse initially.) • Formal sessions in a classroom environment with larger numbers of students • Self-study using interactive video, computer based training (CBT), or reference manuals Fundamental Training Topics The basic training should include some of the following fundamental topics: • How to switch on the hardware and log on to the data warehouse • How to find out what data is there (access the metadata) and interpret its meaning • How to create and issue a query • How to prioritize queries • How to monitor query execution • How to interpret query results • How to save the query and store results • To have a basic understanding about the resources that are used within the query environment, particularly in the environment where query governors are used (as in a warehouse) • How the warehouse works: – Where the data comes from – The level of data quality and integrity (or lack of it) – What mapping is and how it is important – Backup and recovery responsibilities (if any) – Data and query availability – Scheduled downtime ..................................................................................................................................................... Data Warehousing Fundamentals 6-13 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Query Efficiency User considerations • • • • Successful completion Faster query execution Less CPU used More opportunity for further analysis Copyright  Oracle Corporation, 1999. All rights reserved. ® Query Efficiency Designer considerations • • • • • • • Use indexes Select minimum data Employ resource governors Minimize bottlenecks Develop metrics Use prepared and tested queries Use quiet periods Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-14 Data Warehousing Fundamentals Managing User Data Access ..................................................................................................................................................... Query Efficiency User’s Perspective An efficient query has the following characteristics from a user’s perspective. • Runs successfully, completely, and produces the desired results • Takes less time to run and is therefore more beneficial to productivity • Uses less CPU power and therefore costs less if charges are levied • Enables the user to move more quickly onto further analysis Designer’s Role Efficient query access is dependent on the good design of the data warehouse. The following points are important to ensure query efficiency: • Create indexes on key values to minimize full-table scans. • Select only the minimum amount of data required. • Administer resource governors on the server to: – Prevent access – Cut off a query after it has run for a specified time – Inform the user how long a query will take (Resource governors may be set for the entire application or by user group. Governors are vital where data volumes are very large.) • Minimize intensive I/O bottlenecks. • Develop metrics to support queries. • Make more use of prepared and tested queries. • Submit large jobs out of working hours, or when CPU usage, network, and I/O contention is minimal. Note: Database resource manager in Oracle8i provides you with the ability to control and limit the total amount of processing resources available to a given user or set of users. Using this facility, you will be able to: • guarantee certain users a minimum amount of processing resources regardless of the load of the system and the number of users. • distribute available processing resources by allocating percentages of CPU time to different users and applications. • limit the degree of parallelism that a set of users can use. • configure an instance to use a particular method of allocating resources. • select the priority from a given set of priorities that the DBA has assigned to the user. ..................................................................................................................................................... Data Warehousing Fundamentals 6-15 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Charge Models • Examples of charge models: – Flat allocation model – Transaction-based model – Telephone service model – Cable TV model • • Develop your own unique model Avoid a charge model that discourages users from using the warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-16 Data Warehousing Fundamentals Managing User Data Access ..................................................................................................................................................... Charging for Data Warehouse Access At some point the IT Department might need to start charging user groups for data warehouse usage, as a way of obtaining continuous funding for the data warehouse initiative. The chargeback schemes will work only if there are reliable mechanisms to track and monitor usage of the warehouse per user. Charge Models There are a number of different models that may be used to charge for services. Some of the examples are: • Flat allocation model: The cost is allocated by a central group (Financial Controller) based on the percentage of resources used by the organization, such as office space, number of users, and budgets. • Transaction based model: The cost is based on query usage, which may mean calculations based on CPU use, I/O, data, or table elements accessed and reported. • Telephone service model: The cost is based on connection time. • Cable TV model: The cost is based on simple standard service charges plus charges for special services. Some of these models may not apply to your installation; you may consider developing a unique model based on your own unique requirements. Note: Whatever model you employ should balance the needs of the users to access the data they need against the cost of that data, without discouraging use. ..................................................................................................................................................... Data Warehousing Fundamentals 6-17 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Query Scheduling and Monitoring • • Query scheduling – Manages information usage – Directs queries – Executes queries – Sets job queue priorities Query monitoring – Track resource-intensive queries – Detect unused queries – Catch queries that use summary data inefficiently – Catch queries that perform regular summary calculations at the time of query execution – Detect illegal access Copyright  Oracle Corporation, 1999. All rights reserved. ® Query Management and Monitoring Tools • • Use tools, schedulers, Oracle Enterprise Manager Consider – Automation levels – Technology interfaces – Cost Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-18 Data Warehousing Fundamentals Managing User Data Access ..................................................................................................................................................... Managing Queries Query Scheduling Once the warehouse is operational, queries are submitted to the warehouse server. You need to create a process that: • Manages the use of information in the data warehouse • Directs queries to the appropriate data source, using metadata • Schedules the execution of a query • Sets job queue priorities Query Monitoring You need to keep a check on warehouse query activity. The query management program (or tool) must: • Track resource-intensive queries, which require analysis to identify why they are so resource-intensive, followed by tuning to improve performance. • Detect queries that are never used and remove them. Do not forget to ensure that the users need to be advised of this kind of change. • Catch queries that use summary data inefficiently; the summary strategy may need revision. • Catch queries that perform regular summary calculations at the time of query execution. You may decide to include another summary table in the data warehouse with the presummarized data to provide immediate access, which improves overall speed of access. • Detect illegal access. A user may need access to currently denied data. Query Management and Monitoring Tools For scheduling you can use custom inhouse developed programs, a UNIX scheduler, third-party tools, or Oracle Enterprise Manager. For monitoring you may use the DSS tools themselves (where they have the capability), in-house developed tools, and server management products such as Oracle Enterprise Manager. Consider the automation levels, technology interfaces, and cost of the query management and monitoring tools before purchasing them. ..................................................................................................................................................... Data Warehousing Fundamentals 6-19 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Security • • Do not overlook Subject area sponsors: – Review and authorize request for access rights – Identify enhancements • • Transparent security Easy to implement, maintain, and manage Copyright  Oracle Corporation, 1999. All rights reserved. ® Security Plan • Define a strategy: – Allocate business area owners – Ensure invisibility • • • Ensure easy management Consider auditing Manage passwords Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-20 Data Warehousing Fundamentals Security ..................................................................................................................................................... Security Security is commonly controlled by the database administrator (DBA). It must be considered early in the development to ensure that access to the key resource information is controlled. Information is a key company resource that needs protection. Therefore never assume that you can overlook security because user access is query-only. There are some simple guidelines on security that you can follow: • Ensure that each subject area has a sponsor who can carry out the following tasks: – Review and authorize requests for access rights – Identify further enhancements to the security setup (Data may be separated into that which is accessible to all users and that which is accessible to a select few.) • Ensure that the security is transparent and does not impair access from the user perspective • Ensure that the strategy is easy for you to implement, maintain, and manage Security Plan • Allocate an owner to every business area within the warehouse. The owner should be able to advise what access any requestor should be given and define the data that can be made available publicly, compared with data that must be restricted. • Ensure that the security levels are virtually invisible to the users. • Ensure that you can manage and administer the security simply and define a clear, simple strategy for: – Access requests – Allocating predefined roles, both public and restricted, to subject areas – Auditing to identify unauthorized access attempts – Password management ..................................................................................................................................................... Data Warehousing Fundamentals 6-21 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Role-Based Security • Subject area access: – Summary data for new users – All data for experienced users • • • Departmental access Limited object access Access during load ® Copyright  Oracle Corporation, 1999. All rights reserved. Application Context and Fine-Grained Access Control in Oracle8i Who am I? Where am I? Table Access policy Copyright  Oracle Corporation, 1999. All rights reserved. Application context ® ..................................................................................................................................................... 6-22 Data Warehousing Fundamentals Security ..................................................................................................................................................... Role-Based Security You should use the usual technique of database roles that you can use in an operational environment. However, you need to consider implementing role-based security somewhat differently, because of the differences in the way the warehouse and operational systems work. For example, you should set up roles that do the following jobs: • Provide users with access to specific subject areas • Provide users with access by department • Limit access to specific objects within any subject area • Control access when loading data (You need a role to REVOKE and a role to GRANT if you are using Oracle databases.) Fine-Grained Access Control in Oracle8i Fine-grained access control gives customers a way to extend their table-based and view-based security to finer levels of granularity than previously possible. It is implemented by attaching security policies to tables or views. These security policies can limit access by users to only specific rows within the table or view. Application Context Application context is a feature related to fine-grained access control that can be used to implement a security policy. It is provided so that customers who want to do fine-grained access control can base their security policies on information about the user, such as, who is the user, which machine are they using, what is their management hierarchy? Application context provides a secure framework to store such information so that it may be used to implement access to database objects. The justification for fine-grained access control is as follows: • Application-based security can be bypassed. • Views work best for limited number of user groups. • Internet and remote access demand data-driven, user-based security. • Requirements for privacy (For example, in the medical, the human resource, and the defense applications.) • Building security in one place reduces cost of ownership. ..................................................................................................................................................... Data Warehousing Fundamentals 6-23 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Comparing OLAP and DSS • • • • OLAP is used for multidimensional analysis. • Other terms: DSS provides a system enabling decision making. OLAP tools provide a DSS capability. OLAP for the warehouse provides analytical power. – EIS – KBS Copyright  Oracle Corporation, 1999. All rights reserved. ® The Functionality of OLAP • Rotate and drill down to successive levels of detail. • Create and examine calculated data interactively on large volumes of data. • • • Determine comparative or relative differences. Perform exception and trend analysis. Perform advanced analytical functions for example forecasting, modeling, and regression analysis Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-24 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... OLAP The term online analytical processing (OLAP) was coined by Dr. E. F. Codd to describe a technology that could bridge the gap between personal computing and enterprise data management. Decision support systems (DSS) are systems that enable decision makers in organizations to access data relevant to the decisions they are required to make. The definitions of OLAP and DSS are often confused with each other. Comparing OLAP and DSS OLAP Online analytical processing covers a wide spectrum of usage and a wide variety of requirements. Online analytical processing has a number of different definitions, such as a loosely defined set of principles that provide a dimensional framework for decision support. Essentially OLAP is a flexible analytical tool that is commonly used to analyze and interpret data in a data warehouse or data mart. DSS Decision support systems are not new. They have been around for many years. In an earlier lesson, you saw that decision support systems were provided with information obtained from data extract processing. DSS, therefore, provide users with data, enabling decision making. They may or may not be a data warehouse or data mart. They may have an operational environment or an operational environment with data extracts used for specific decision making activities. There is little distinction between decision support and online analytical processing. Online analytical processing tools provide a decision support capability. Both online analytical processing and decision support query and reporting tools provide the means for informed decision making. OLAP and DSS Tools for the Warehouse Ultimately, online analytical processing tools and decision support tools that are designed to access warehouse data are more flexible and more capable of true analysis than standard reporting tools typically used to access relational operational data. The Functionality of OLAP OLAP provides much more than just the ability to perform rotating or drilling down. It offers the ability to create and examine calculated data interactively on large volumes of data, the ability to determine comparative or relative differences, as well as the ability to perform exception and trend analysis on calculated data. Some of the advanced analytical functions of OLAP are forecasting, modeling, regression analysis, and solving simultaneous equations. Note: OLAP and DSS are also referred to as EIS (executive information systems) or KBS (knowledge based systems). ..................................................................................................................................................... Data Warehousing Fundamentals 6-25 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Original OLAP Rules 1. Multidimensional conceptual view 2. Transparency 3. Accessibility 4. Consistent reporting performance 5. Client-server architecture Copyright  Oracle Corporation, 1999. All rights reserved. ® Original OLAP Rules 6. Generic dimensionality 7. Dynamic sparse matrix handling 8. Multiuser support 9. Unrestricted cross-dimensional operations 10. Intuitive data manipulation 11. Flexible reporting 12. Unlimited dimensions and aggregation levels Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-26 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... Original 12 OLAP Rules of Dr. E. F. Codd The OLAP rules were originally defined by Dr E. F. Codd. He saw the need for a model that was more suitable for mapping to the way analysts understand the business. 1 Multidimensional conceptual view: A tool should provide users with a multidimensional model that corresponds to the business problems and is intuitively analytical to use. 2 Transparency: The OLAP system’s technology, the underlying database and computing architecture, and the heterogeneity of input data sources should be transparent to users to preserve their productivity and proficiency with familiar front-end environments and tools. 3 Accessibility: The OLAP system should access only the data actually required to perform the analysis. Additionally, the system should be able to access data from all heterogeneous enterprise data sources required for the analysis. 4 Consistent reporting performance: As the number of dimensions and the size of the database increase, users should not perceive any significant degradation in performance. 5 Client-server architecture: The OLAP system has to conform to client-server architectural principles for maximum price and performance, flexibility, adaptivity, and interoperability. 6 Generic dimensionality: Every data dimension must be equivalent in both structure and operational capabilities. 7 Dynamic sparse matrix handling: The OLAP system has to be able to adapt its physical schema to the specific analytical model that optimizes sparse matrix handling to achieve and maintain the required level of performance. 8 Multiuser support: The OLAP system must be able to support a work-group of users working concurrently on a specific model. 9 Unrestricted cross-dimensional operations: The OLAP system must be able to recognize dimensional hierarchies and automatically perform associated roll-up calculations within and across dimension. 10 Intuitive data manipulation: Consolidation path reorientation drill-down and rollup, and other manipulations should be accomplished through direct point-andclick, drag-and-drop actions on the cells of the cube. 11 Flexible reporting: The ability to arrange rows, columns, and cells in a fashion that facilitates analysis by intuitive visual presentation of analytical reports must exist. 12 Unlimited dimensions and aggregation levels: Depending on business requirements, an analytical model may have a dozen or more dimensions, each having multiple hierarchies. The OLAP system should not impose any artificial restrictions on the number of dimensions or aggregation levels. ..................................................................................................................................................... Data Warehousing Fundamentals 6-27 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Relational Database Model Attribute 1 Attribute 2 Attribute 3 Attribute 4 Name Age Gender Emp No. Row 1 Anderson 31 F 1001 Row 2 Green 42 M 1007 Row 3 Lee 22 M 1010 Row 4 Ramos 32 F 1020 The table above illustrates the employee relation. ® Copyright  Oracle Corporation, 1999. All rights reserved. Multidimensional Database Model Customer Store Store Time SALES Product Time FINANCE GL_Line The data is found at the intersection of dimensions. Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-28 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... Comparing Relational and Multidimensional Database Models Before examining online analytical processing in any more detail, you should consider the difference between relational and multidimensional (OLAP) database models. The Relational Database Model A relation is a two-dimensional table. Each row in the table holds data that pertain to some thing or a portion of some thing. Each column of the table contains data regarding an attribute. Sometimes rows are called tuples and columns are called attributes. For example, the top slide on the facing pages is a sample table. Notice that it has four rows (tuples) made up of four columns (attributes). The Multidimensional Database Model You can visualize the data model for a multidimensional database as a cube (the equivalent of a table in a relational database). Each cube has several dimensions (equivalent to index fields in relational tables). The cube acts like an array in a conventional programming language. Logically, the space for the entire cube is preallocated. To find or insert data, you use the dimension values to calculate the position. For example, sales for Product P2, Store London, and Time Jan97 may be in position [2,50,13]. In practice, a multidimensional product would have techniques to compress the amount of disk space used. In the diagram, the database contains two cubes. Sales is a four-dimensional cube of information collected over time by store, product and customer. The Financial information cube is three-dimensional, collected by time, store, and general ledger account line. The store and time dimensions are common to the two cubes. Because the database can contain many cubes, this approach is sometimes referred to as multicube storage. A cube can also be a formula rather than a variable. In this case the cube is stored as a calculation formula such as Profit = Revenue – Expenses, and the data is calculated on demand from the stored cubes for revenue and expenses. This is like a view in a relational system. The power of this model is the high degree of analysis it puts at your fingertips, when combined with online analytical processing tools. Online analytical processing today generally involves the use of a separate multidimensional server that contains a relatively small amount of highly indexed data from operational systems. ..................................................................................................................................................... Data Warehousing Fundamentals 6-29 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Relational Server • • Benefits: – Well-known environment with many experts in most organizations able to support the product – Can be used with data warehousing and operational systems – Many tools available with advanced features including improvements made to performance with report servers Disadvantages: – Does not have any complex functions or analysis capabilities provided by OLAP tools – These products may also be restricted to the volumes of data they can access Copyright  Oracle Corporation, 1999. All rights reserved. ® Multidimensional Server • Benefits: – Quick access to very large volumes of data – Extensive and comprehensive libraries of complex functions specifically for analysis – Strong modeling and forecasting capabilities – Can access multidimensional and relational database structures • Disadvantages: – Difficulty of changing dimensions without reaggregating to time – Lack of support for very large volumes of data Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-30 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... Choosing Between Relational and Multidimensional Servers Each database server has its own strengths and weaknesses. • Relational Server Benefits: – Well-known environment with many experts in most organizations able to support the product. – Can be used with data warehousing and operational systems. – Many tools available with advanced features including improvements made to performance with report servers. Disadvantages: – Does not have any complex functions or analysis capabilities provided by OLAP tools. – These products may also be restricted to the volumes of data they can access. • Multidimensional Server Benefits: – Quick access to very large volumes of data. – Extensive and comprehensive libraries of complex functions specifically for analysis. – Strong modeling and forecasting capabilities. – Can access multidimensional and relational database structures. Disadvantages: – Difficulty of changing dimensions without reaggregating to time. – Lack of support for very large volumes of data. ..................................................................................................................................................... Data Warehousing Fundamentals 6-31 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... MOLAP Server • The application layer stores data in a multidimensional structure • The presentation layer provides the multidimensional view DSS client MOLAP Engine Application layer Warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® MOLAP Server • Data – Arrays DSS client – Cached – Offloaded from server • • • Efficient storage and processing Complexity hidden from the user MOLAP engine Application layer Analysis using preaggregated summaries and precalculated measures Warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-32 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... Multidimensional OLAP Server (MOLAP) The multidimensional online analytical processing (MOLAP) engine takes the data from the warehouse or from operational sources. The MOLAP engine then stores the data in proprietary data structures, summaries, and precalculates as many outcomes as possible. Characteristics • Data is stored as a precalculated array. • The data resides, or is cached, in a proprietary multidimensional database, with a multidimensional viewer. Both the data and index values are held in arrays. • The database is organized to allow rapid retrieval of related data across multiple dimensions. • Data can be offloaded from the server onto the client for local access, reducing network traffic. However, it can take time to form the cubes. • The MOLAP tools store and process multidimensional data efficiently. • The calculation engine creates new information from existing data through formulas and transformations. • The complexity of the underlying data is transparent to the user. • The tools can exploit the complexity of the analysis involved. • The complex analytical querying capabilities enable a business to respond to change faster. • Preaggregated summary data and precalculated measures enable quick and easy analysis of complex data relationships. ..................................................................................................................................................... Data Warehousing Fundamentals 6-33 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... ROLAP Server • The warehouse stores atomic data. • The application layer generates SQL for the threedimensional view. • DSS client ROLAP engine The presentation layer Application provides the multidimensional layer Multiple view. SQL Warehouse server Copyright  Oracle Corporation, 1999. All rights reserved. ® ROLAP Server • • Data and metadata in server • • High connectivity Multidimensional views of data Unlimited – Database size – Query criteria • Complex SQL generated by tool DSS client ROLAP engine Application layer Multiple SQL Warehouse server Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-34 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... Relational Database OLAP Server (ROLAP) The relational online analytical processing (ROLAP) engine takes data from the relational data warehouse. The ROLAP engine uses its built-in SQL functionality to create a multidimensional representation of the data and presents that to the user as a multidimensional view. Characteristics • Data and metadata is stored as records in the relational database. The OLAP server uses this metadata dynamically to generate the SQL statements necessary to retrieve the data as the user requests it. • Users see a multidimensional view of data that is stored in relational tables. • End users are supplied with a multidimensional viewing tool to view the relational data. • There is high capacity connectivity to powerful servers. • There are no limitations on the size of the database or the kind of analysis that may be performed. However, if the server is SQL-driven, some engines may severely affect performance if the user joins several tables or performs complex computations. • Complex SQL code is generated by the ROLAP tool. The tools create a number of SQL statements when they access the database; this may adversely affect performance. ..................................................................................................................................................... Data Warehousing Fundamentals 6-35 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... MOLAP, ROLAP, and HOLAP ? Warehouse ? Express Server Copyright  Oracle Corporation, 1999. All rights reserved. Express user ® ..................................................................................................................................................... 6-36 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... MOLAP, ROLAP, and HOLAP Multidimensional OLAP (MOLAP), relational OLAP (ROLAP), and hybrid OLAP (HOLAP) are terms that can cause some confusion. OLAP The key concept is the consistent theme in each of these configurations: online analytical processing. OLAP tools and applications must be able to manipulate and display data using a multidimensional view. The multidimensional data model is specifically designed for this type of analysis, and reflects the way users think about their businesses. • Performance Versus Storage: The central issue surrounding this OLAP configuration question is the trade-off between performance and storage space. When data is stored in the multidimensional model (MOLAP), data-access performance is maximized for the end user. However, some redundancy of storage results, and multidimensional databases can become extremely large. When data is stored only in the warehouse and is brought into the multidimensional cache when queried (ROLAP), added storage is not an issue, but query performance suffers. • Flexible OLAP Access: A complete OLAP solution should provide any of these options. Oracle Express technology is based on a multidimensional data model, but the underlying data can be structured in a number of ways. ..................................................................................................................................................... Data Warehousing Fundamentals 6-37 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... MOLAP MDDB Query Periodic load Data Express Server Warehouse Express user ® Copyright  Oracle Corporation, 1999. All rights reserved. ROLAP Cache Live fetch Query Data cache Warehouse Data Express Server Copyright  Oracle Corporation, 1999. All rights reserved. Express user ® ..................................................................................................................................................... 6-38 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... MOLAP, ROLAP, and HOLAP (continued) MOLAP In a pure MOLAP environment, data from the warehouse, online transactional processing (OLTP) systems, or other external source is periodically loaded into a multidimensional database (MDDB) such as Oracle Express, where it is presummarized and optimized for analysis. ROLAP In a ROLAP environment, relational data from a data warehouse or data mart is retrieved in response to a user query on the fly, and that data is brought into the Oracle Express multidimensional cache. Once data has been cached into Oracle Express, subsequent access of that same data does not require a refetch of the data from the warehouse. ..................................................................................................................................................... Data Warehousing Fundamentals 6-39 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Hybrid (HOLAP) MDDB and cache Periodic load Query Data Fetch, cache Warehouse Express Server Copyright  Oracle Corporation, 1999. All rights reserved. Express user ® ..................................................................................................................................................... 6-40 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... MOLAP, ROLAP, and HOLAP (continued) HOLAP The MOLAP and ROLAP approaches can be combined into a hybrid (HOLAP) solution, which takes advantage of the strengths of both the ROLAP and MOLAP methods. In the hybrid solution, the relational database is used to store the bulk of the detail data, and the multidimensional model is used to store summary data. ..................................................................................................................................................... Data Warehousing Fundamentals 6-41 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Choosing a Reporting Architecture • • • • • • • Good Business needs User adaptability GUI interface MOLAP Query Performance Computer architecture Network architecture Network throughput ROLAP OK Simple Openness Copyright  Oracle Corporation, 1999. All rights reserved. Complex Analysis ® ..................................................................................................................................................... 6-42 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... Choosing Between ROLAP and MOLAP Architectures and Tools Factors Influencing Query Tool Choice The diagram shows that ROLAP serves the user who requires simple analysis and MOLAP serves the user who needs more complex analysis, because of the performance and summarization benefits of MOLAP. There are a number of key issues to consider when determining which product to use: • Business need: Does the tool fit current and future reporting requirements? Consider whether the tool is able to successfully access the data sources and models needed to provide information required. Is the tool able to access the volumes of data necessary to perform the analysis required? • User: Some tools have a steep learning curve and are specialized in their presentation. Is there room in your organization for yet another specialist tool? Does the tool provide the flexibility, functionality, and speed needed? • GUI: Consider how organized, intuitive, user-friendly, and robust the interface is. • Computing architecture: Consider existing computer architectures. Decide whether the fat client with its associated features and functionality could be replaced by the thin client. Do the selected tools fit in with your current and planned architecture? • Network architecture: Consider how the products deploy their requests across the network, and the effects on the network and server. Can the chosen network (WAN, LAN, or MAN) support the analysis approaches chosen? Conversely, can the tool fit within the network architecture defined? • Network throughput: Is the network capable of the capacity? Is it likely to be affected by access contention? What is your networking strategy? Do you have one? • Openness: Is the product portable and does it have the necessary application program interface (API) to connect to the databases you have in place? Can you write or customize APIs? ..................................................................................................................................................... Data Warehousing Fundamentals 6-43 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Choosing a Reporting Architecture • • • • Performance Good MOLAP Scalability Management Query Performance Enterprisewide perspective ROLAP OK Simple Copyright  Oracle Corporation, 1999. All rights reserved. Complex Analysis ® ..................................................................................................................................................... 6-44 Data Warehousing Fundamentals OLAP ..................................................................................................................................................... Choosing Between ROLAP and MOLAP Architectures and Tools (continued) Factors Influencing Query Tool Choice (continued) • Performance: Will the product be able to respond to the variety of queries required in acceptable (defined) time frames? Determine your own speed metrics. Ensure that the tool can meet service level agreement response times required, and if not you should renegotiate. • Scalability: Consider whether the tool is capable of expanding to meet future needs, for example, moving from a simple daily reporting situation to alert-driven exception reporting, without major modification. • Management: What kind of management and support does the product require? Is there a large administrative task in setting up the environment and building enduser layers (metalayers)? • Enterprisewide perspective: Always consider the tools with an enterprisewide approach in mind, not just local, or departmental, considerations. ..................................................................................................................................................... Data Warehousing Fundamentals 6-45 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Client-Server Access • • Mainframe power preserved Warehouse server Tools – Simple query – Complex query Common protocol Common gateway – Data mining Common protocol Windows Macintosh OS/2 UNIX Copyright  Oracle Corporation, 1999. All rights reserved. ® Web Access • • • Internet: global network Intranet: corporate access Lower costs – Hardware – Communication – Application • Security issues Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-46 Data Warehousing Fundamentals Query Access Architectures ..................................................................................................................................................... Query Access Architectures In the industry today, there are many architectures, and in the warehouse environment the two most prominent are client-server and Web access. Client-Server Access The principle behind the client-server approach is to split the processing among servers and localized processing on the client. This openness among systems provides the configuration with total flexibility. Different users may run different tools that access the data warehouse. They are: • Simple query tools • Complex analysis tools • Data mining tools Web Access At this time data warehouse information is provided as Web-based applications on intranets (networks within a company), as an alternative to other DSS delivery mechanisms. Internet and intranet access to a warehouse may bring these benefits: • Lower hardware costs • Lower communication costs • Lower application licensing and maintenance costs • Minimized burden on administrators Internet Security Issues Security issues abound in this environment, and you must carefully consider the impact of providing global access to your data. You should consider: • View-based security techniques, with a permissions table identifying users’ clearance codes. The codes themselves match to clearance codes held with the data in the warehouse. • Caching techniques that allow only queries available to users of a certain code to actually access the cached data. • Password abstraction, which allows you to specify for access a password that is then converted behind the scenes, when access to the database is then made available. ..................................................................................................................................................... Data Warehousing Fundamentals 6-47 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Fat Client PC clients to high-end servers • • • • • Demand more software and hardware Are difficult to administer Give limited application reusability Provide a lot of software for limited use Are expensive to buy, maintain, and license Copyright  Oracle Corporation, 1999. All rights reserved. ® Thin Client • Browser device to server – Lower hardware cost – Lower license cost – Open deployment • Challenges – Less of a library – More security, data integrity, and distributed capabilities – Robustness, scalability, and extensibility • Example: NC from Oracle Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-48 Data Warehousing Fundamentals Query Access Architectures ..................................................................................................................................................... Fat Client In a client-server architecture, a fat client is a client that performs the bulk of the data processing operations. The data itself is stored on the server. During the 1980s, the industry introduced PCs (clients) with graphical interfaces and high-end servers that can house databases. As these became more popular, companies downsized, rightsized, and reduced mainframe computing architectures. Today, the PC is the foundation of most modern enterprise systems, and gives many users the ability to perform many tasks with ease. PCs create some challenges, however: • They have become “fat,” demanding more software and hardware. • Administering multiple copies of software is difficult. • Once developed, client software offers limited reusability in extending applications. • Users require a limited selection of the software available on the PC. • PCs are costly to purchase and maintain in terms of the amount of software required to support each device. Thin Client In client-server applications, a thin client is designed to be especially small so that the bulk of the data processing occurs on the server. A thin client is a network computer without a hard disk drive, whereas a fat client includes a disk drive. Advances in Internet technology, decreases in the cost of high-end servers, and increases in the total cost of purchasing, supporting, and maintaining PCs are prompting IT departments to reconsider their client-server strategy. They are starting to use the features of the Web to eliminate the reliance on PCs. To this end, the “thin” client (a browser) is a device that contains the application logic, connected to the highend server. Thin client access to a data warehouse across the Web has a number of advantages: • Lower hardware cost per user • Lower licensing costs per user (The software is centralized on the server.) • Open deployment platform Web access is still in its early years and has some challenges to face. It needs to: • Evolve from a library of documents to an electronic business platform that can conduct secure transactions on intranets and the Internet • Provide rich levels of security, data integrity, and distributed transaction support • Provide robust, scalable, and reusable extensibility The network computer (NC), available from Oracle, is an example of a thin client. ..................................................................................................................................................... Data Warehousing Fundamentals 6-49 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Summary The lesson discussed the following topics: • Building a data warehouse entails enabling users to access the information in the warehouse • Determining user query needs is an important part of the data warehouse project implementation • Planning for good data access capability is important to the success of the data warehousing project Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-50 Data Warehousing Fundamentals Summary ..................................................................................................................................................... Summary The lesson discussed the following topics: • The purpose of building a data warehouse is to enable users to access the information in the warehouse • Determining user query needs is an important part of the data warehouse project implementation • Planning for good data access capability is important to the success of the data warehousing project ..................................................................................................................................................... Data Warehousing Fundamentals 6-51 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... Practice 6-1 Overview This practice covers the following topics: • • Completing a user profile exercise • Performing the “Security Consideration Checklist” exercise Answering true/false questions on user involvement in determining query access to the data warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 6-52 Data Warehousing Fundamentals Practice 6-1 ..................................................................................................................................................... Practice 6-1 1 Complete the user profile column in this exercise with one of the following user types: – Executive – Casual user or manager – Business analyst or power user Name Brian O’Reilly Mary Ramos Kim Seng Amber Salinas Access Needs • Need to develop simple forecast, such as budgets • Ease of use is important • One click access • Only need highly summarized information • Ease of use is very important • Constantly wants to “get more data” • Understands the organization’s business processes Technology • Microsoft Office • Internet browser • Spreadsheets • • • • Lots of drilling Customize graphical user interface (GUI) Needs to know data structures • • • E-mail Microsoft Office Internet browser • • • • Spreadsheets Oracle Reports Oracle Discoverer Oracle Express Analyzer Extensive SQL programming Oracle7X, Oracle8X Server Oracle Express • • User Profile 2 Answer true or false to the following questions. Question a Do not involve users in the early process of the data warehouse implementation because they are going to delay your delivery date. b Choose the warehouse data access tools by involving only IT staff because they are the ones who know what the users need. c Prototype access methods with prospective users. True False ..................................................................................................................................................... Data Warehousing Fundamentals 6-53 Lesson 6: Analyzing User Query Needs ..................................................................................................................................................... 3 Security Consideration Checklist exercise: Form into small groups, and discuss each of the following questions. For each question, discuss briefly whether you would use it in your own security consideration checklist back at your workplace, and rate its importance relative to the other questions on the checklist. Security Consideration Question a Security should be addressed at column level (and in some cases at the row level), at the table level, at the database level, at the tools level, at the client and server level, and at the network level. b Create views to limit access to particular columns or, in unusual circumstances, rows. c Do not rely on anything to protect the database except the database security. d How are reports upgraded when new versions are released? e Security should be implemented based on what makes the most sense for both the short-term and long-term health of the business. Judge security not only by its structure, but by how well it supports the entire corporate organization’s needs and survival. Will You Use? Why? ..................................................................................................................................................... 6-54 Data Warehousing Fundamentals 7 ................................. Modeling the Data Warehouse Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Overview Defining DW Concepts & Terminology Planning for a Successful Warehouse Choosing a Computing Architecture Meeting a Business Need Planning Warehouse Storage Modeling the Data Warehouse ETT (Building the Warehouse) Analyzing User Query Needs Supporting End User Access Managing the Data Warehouse Project Management (Methodology, Maintaining Metadata) Copyright  Oracle Corporation, 1999. All rights reserved. Objectives After completing this lesson, you should be able to do the following: • List generic phases for modeling a data warehouse • • List the components of a warehouse data model Identify tools available for warehouse modeling Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-2 Data Warehousing Fundamentals Overview ..................................................................................................................................................... Overview This lesson examines the role of data modeling in a data warehousing environment. The lesson presents a very high level overview of warehouse modeling steps. You consider the different types of models that can be employed, such as the star schema. Tools available for warehouse modeling are introduced. Note that the “Modeling the Data Warehouse” block is highlighted in the overview slide on the facing page. Objectives After completing this lesson, you should be able to do the following: • List generic phases for modeling a data warehouse • List the components of a warehouse data model • Identify tools available for warehouse modeling Note: Oracle offers a two-day, instructor-led course entitled Data Warehouse Database Design. That course teaches comprehensive database design by using a case study, whereas this lesson provides a high-level overview. ..................................................................................................................................................... Data Warehousing Fundamentals 7-3 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Data Warehouse Database Design Phases 1 1. Defining the business model (conceptual model) Select a business process 2. Creating the dimensional model (logical model) 3. Modeling summaries 4. Creating the physical model 2, 3 4 Physical model Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-4 Data Warehousing Fundamentals Data Warehouse Database Design Phases ..................................................................................................................................................... Data Warehouse Database Design Phases In the past several years, a number of methods for designing a data warehouse have been published. Although these methods define certain terms differently, all include the same general tasks required to produce a sound data warehouse database design. This lesson focuses on the major tasks associated with the data warehouse database design process. These tasks have been grouped into four phases: • Defining the business model • Creating the dimensional (logical or star schema) model • Modeling summaries • Creating the physical model ..................................................................................................................................................... Data Warehousing Fundamentals 7-5 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Performing Strategic Analysis Phase 1: Defining the Business Model Performing strategic analysis Select a business process Creating the business (conceptual) model Copyright  Oracle Corporation, 1999. All rights reserved. Creating the Business Model Phase 1: Defining the Business Model Performing strategic analysis Creating the business (conceptual) model – Defining business requirements – Identifying the business measures – Identifying the dimensions – Identifying the grain – Identifying the business definitions and rules – Verifying data sources Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-6 Data Warehousing Fundamentals Phase One: Defining the Business Model ..................................................................................................................................................... Phase One: Defining the Business Model Performing Strategic Analysis Performed at the enterprise level, strategic analysis identifies, prioritizes, and selects the major business processes (also called business events or subject areas) that are most important to the overall corporate strategy. Strategic analysis includes the following steps: • Identify the business processes that are most important to the overall corporate strategy. • Understand the business processes by drilling down on the dimensions that characterize each business process. • Prioritize and select the business process to implement in the warehouse, based on which one will provide the quickest and largest return on investment (ROI). Creating the Business Model The strategic analysis step produces a high-level definition of the chosen business process or processes. In this second step of the business modeling phase, a business model is created. Defining Business Requirements The business model is created by defining the business analysis requirements for each process. The previous lesson discussed interviewing end users to learn their query needs. You will also need to meet with business managers and business analysts who are directly responsible for the specific business processes in order to: • Define specific business measures. • Create a detailed listing of the dimensions that characterize each measure. • Identify the granularity required to satisfy the analysis requirements. • Clarify business definitions and business rules. Verifying Data Sources Concurrently, you must perform an information systems (IS) data audit, a systematic exploration of the underlying legacy source systems to verify that the data required to support the business requirements is available. ..................................................................................................................................................... Data Warehousing Fundamentals 7-7 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Business Requirements Drive the Design Process Primary input Business requirements Other inputs Existing metadata Production ERD model Research Nonrelational legacy systems Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-8 Data Warehousing Fundamentals Phase One: Defining the Business Model ..................................................................................................................................................... Business Requirements Drive the Design Process The entire scope of the data warehouse initiative must be driven by business requirements. Business requirements determine: • What data must be available in the warehouse • How data is to be organized • How often data is updated • End-user application templates • Maintenance and growth Primary Input The business requirements are the primary input to the design of the data warehouse. Information requirements as defined by the business people—the end users—will lay the foundation for the data warehouse content. Other Inputs Overlaying those requirements with source information and further research regarding how data is used helps to determine the specific data that the data warehouse will provide. Other sources may be: • Existing metadata • Source ER diagrams from relational OLTP systems • Research • Legacy nonrelational systems data ..................................................................................................................................................... Data Warehousing Fundamentals 7-9 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Identifying Measures and Dimensions Dimensions Measures The attribute varies continuously: • • • • Balance Units Sold Cost Sales The attribute is perceived as a constant or discrete value: • • • • Description Location Color Size Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-10 Data Warehousing Fundamentals Phase One: Defining the Business Model ..................................................................................................................................................... Identifying Measures and Dimensions Measures A measure contains a numeric value that measures an aspect of the business. Typical examples are gross sales dollars, total cost, profit, margin dollars, or quantity sold. A measure can be additive or partially additive across dimensions. Dimensions A dimension is an attribute by which measures can be characterized or analyzed. Dimensions bring meaning to raw data. Typical examples are customer name, date of order, or product brand. Ultimately, the business requirements document should contain a list of the business measures and a detailed list of all dimensions, down to the lowest level of detail for each dimension. An example is shown in the slide for a retail customer sales process. Distinguishing Between Measures and Dimensions During the warehouse design, you must decide whether a piece of data is a measure or a dimension. You can use the following as a guide: • If the data regularly changes value, it is a measure; for example, units sold or account balances. • If the data is constant (a discrete value), it is a dimension. For example, the color of a product and the address of a customer are unlikely to change frequently. • A need or capability to summarize often identifies a measure. • Dimensions are typically represented along the axes of existing reports. These rules are not definitive but act as a guide where there is indecision. ..................................................................................................................................................... Data Warehousing Fundamentals 7-11 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Determining Granularity YEAR? QUARTER? MONTH? WEEK? DAY? Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-12 Data Warehousing Fundamentals Phase One: Defining the Business Model ..................................................................................................................................................... Determining Granularity When gathering more specific information about measures and analytic parameters, it is important also to understand the level of detail that is required for analysis and business decisions. This level of detail is called granularity. The greater the level of detail, the finer the level of granularity. The Key Question What do your users really need for now and for the near-term future? Determine that and then design for one grain finer. Consider that users typically perform fine-grain analysis on a short horizon, maybe six weeks. Thus, as a solution, you can retain six weeks of data online and roll off the aged data automatically. Note: Remember that you can always aggregate upward, but you cannot disaggregate lower than the data that is stored in the data mart. ..................................................................................................................................................... Data Warehousing Fundamentals 7-13 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Identifying Business Rules Location Geographic proximity 0 - 1 miles 1 - 5 miles > 5 miles Product Type Monitor Status PC Server 15 inch 17 inch 19 inch None New Rebuilt Custom Time Store Month > Quarter > Year Store > District > Region Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-14 Data Warehousing Fundamentals Phase One: Defining the Business Model ..................................................................................................................................................... Identifying Business Rules Business model elements should also be documented with agreed-upon business rules and definitions. For example, the wholesale computer sales process might include the following business rules: • All product items are grouped by status. • March, April, and May make up the first quarter in the fiscal year. • A store is in one and only one district. ..................................................................................................................................................... Data Warehousing Fundamentals 7-15 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Creating the Dimensional Model Phase 2: Creating the Dimensional (Logical) Model Identify fact tables – Translate business measures into fact tables – Analyze source system information for additional measures – Identify base and derived measures – Document additivity of measures Identify dimension tables Link fact tables to the dimension tables Create views for users Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-16 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Phase Two: Creating the Dimensional Model When you complete the first phase, defining the business model, you proceed to the second phase, creating the dimensional (logical) model. • Identify fact tables – Translate business measures into fact tables – Analyze source system information for additional measures • Identify dimension tables • Link fact tables to dimension tables • Create views for users ..................................................................................................................................................... Data Warehousing Fundamentals 7-17 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Dimension Tables Dimension tables have the following characteristics: • Contain textual information that represents the attributes of the business • • Contain relatively static data Are joined to a fact table through a foreign key reference Product Channel Facts (units, price) Customer Time Copyright  Oracle Corporation, 1999. All rights reserved. Fact Tables Fact tables have the following characteristics: • Contain numeric measures (metrics) of the business • May contain summarized (aggregated) data • • • • May contain date-stamped data Are typically additive Have key value that is typically a concatenated key composed of the primary keys of the dimensions Joined to dimension tables through foreign keys that reference primary keys in the dimension tables Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-18 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Dimension Tables Dimensions are the textual descriptions of the business. Dimension tables are typically smaller than fact tables and the data changes much less frequently. Dimension tables give perspective regarding the whys and hows of the business and element transactions. While dimensions generally contain relatively static data, customer dimensions are updated more frequently. Dimensions Are Essential for Analysis The key to a powerful dimensional model lies in the richness of the dimension attributes because they determine how facts can be analyzed. Dimensions can be considered as the entry point into “fact space.” Always name attributes in the users’ vocabulary. That way, the dimension will document itself and its expressive power will be apparent. Fact Tables Facts are the numerical measures of the business. The fact table is the largest table in the star schema and is composed of large volumes of data. Although a star schema typically contains one fact table, other DSS schemas can contain multiple fact tables. Raw facts such as dollar sales can be combined or calculated with other facts to create measures. Measures can be stored in the fact table or created when necessary for reporting purposes. ..................................................................................................................................................... Data Warehousing Fundamentals 7-19 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Dimensional Model (Star Schema) Fact table Product Channel Facts (units, price) Customer Time Dimension tables Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-20 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Dimensional Model Schema A schema is a collection of database objects, such as tables, views, indexes, and synonyms. Dimensional Model The dimensional model has a single fact table and one or more lookup or dimension tables for analytical purposes. Star Schema The star schema is the simplest form of a dimensional model. The fact table contains foreign keys that reference primary keys in the dimension tables. ..................................................................................................................................................... Data Warehousing Fundamentals 7-21 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Star Schema Model Product Table Product_id Product_desc … • • • Central fact table Radiating dimensions Denormalized model Store Table Store_id District_id ... Sales Fact Table Product_id Store_id Item_id Day_id Sales_dollars Sales_units ... Time Table Day_id Month_id Period_id Year_id Item Table Item_id Item_desc ... Copyright  Oracle Corporation, 1999. All rights reserved. Star Schema Model • • • • • • • Easy for users to understand Fast response to queries Simple metadata Supported by many front end tools Less robust to change Slower to build Does not support history Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-22 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Star Schema Model A star schema model can be depicted as a simple star; a central table contains fact data, and multiple tables radiate out from it, connected by database primary and foreign keys. Unlike other database structures, a star schema has denormalized dimensions. A star model: • Is easy to understand by the users because the structure is so simple and straightforward • Provides fast response to queries with optimization and reductions in the physical number of joins required between fact and dimension tables • Contains simple metadata • Is supported by many front end tools • Is slow to build because of the level of denormalization The star schema is emerging as the predominant model for data warehouses ..................................................................................................................................................... Data Warehousing Fundamentals 7-23 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Snowflake Schema Model Product Table Product_id Product_desc Store Table Store_id Store_desc District_id District Table District_id District_desc Sales Fact Table Item_id Store_id Sales_dollars Sales_units Time Table Week_id Period_id Year_id Item Table Item_id Item_desc Dept_id Dept Table Dept_id Dept_desc Mgr_id Mgr Table Dept_id Mgr_id Mgr_name Copyright  Oracle Corporation, 1999. All rights reserved. Snowflake Schema Model • • • • • • Direct use by some tools More flexible to change Provides for speedier data loading May become large and unmanageable Degrades query performance More complex metadata Country State County City Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-24 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Snowflake Schema Model A snowflake model is closer to an entity relationship diagram than the classic star model because the dimension data is more normalized. Developing a snowflake model means building class hierarchies out of each dimension (normalizing the data). A snowflake model: • Results in severe performance degradation because of its greater number of table joins • Provides a structure that is easier to change as requirements change • Is quicker at loading data into its smaller normalized tables, compared to loading into a star schema’s larger denormalized tables • Allows using history tables for changing data, rather than level fields (indicators) • Has a complex metadata structure that is harder for end user tools to support One of the major reasons why the star schema model has become more predominant than the snowflake model is its query performance advantage. In a warehouse environment, the snowflake’s quicker load performance is much less important than its slower query performance. Other Warehouse Models Besides the star and snowflake schemas, there are other models that can be considered. Constellation A constellation model (also called galaxy model) simply comprises a series of star models. Constellations are a useful design feature if you have a primary fact table, and summary tables of a different dimensionality. It can simplify design by allowing you to share dimensions among many fact tables. Third Normal Form Warehouse Some data warehouses consist of a set of relational tables that have been normalized to third normal form (3NF). Their data can be directly accessed using SQL code. They may have more efficient data storage, at the price of slower query performance due to extensive table joins. Some large companies build a 3NF central data warehouse feeding dependent star data marts for specific lines of business. ..................................................................................................................................................... Data Warehousing Fundamentals 7-25 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Using Summary Data Phase 3: Modeling summaries • • • Provides fast access to precomputed data • Usually exists in summary fact tables Reduces use of I/O, CPU, and memory Is distilled from source systems and precalculated summaries Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-26 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Using Summary Data Summary data contains fact data that is summarized, such as maximum, minimum, average, and total, rather like the total or subtotal line of a report. When you require summary information, you have two choices: • Issue the SQL, access the dimensions, then access the base fact table and perform the summary calculations on all the selected rows to produce the result (possibly involving thousands to millions of rows). • Issue the SQL, access the dimensions, then access the keys to the related summary table and find the row with the presummarized data to produce the result. Having direct access to a summary table containing precomputed data reduces the disk I/O, and CPU sort, and memory swapping requirements. Summary data is also referred to as aggregated data, aggregated facts, or aggregated detail. Lightly and Highly Summarized Data Summary data falls into two loose categories: • Lightly summarized data is summarized from the incoming fact data and normally stored over a unit of time. Please refer to the earlier discussion on granularity. • Highly summarized data is more compact. It may be distilled from lightly summarized data or introduced into the warehouse already in the highly compact format. ..................................................................................................................................................... Data Warehousing Fundamentals 7-27 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Designing Summary Tables • • • • Average Maximum Units Total Percentage Sales($) Store Product A Total Product B Total Product C Total Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-28 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Designing Summary Tables Summary tables contain fact data that is aggregated using functions such as total, average, and margin. The summary table shares the dimensions used by the fact data. Summary data usually exists in summary fact tables, but it may exist in dimension tables if it is discrete (such as year-to-date figures). For example, a customer dimension may contain attributes, such as city, state, and country. The summary table can use these hierarchical attributes to show summary measures for those dimensions of the business. How Many Summaries? The issue with summary tables is not whether you are going to have any, but how many you are going to have. Business users require summary information. For example, a manager needs the bottom line figures that show how well the company is performing. Analysis of the requirement is instrumental in ensuring that the users get the information they need and that they get it quickly. A warehouse may contain hundreds of summary tables. What to Summarize Deciding on what summary data to maintain in the warehouse is an early design consideration and is based upon the users’ query requirements. These requirements are determined early on during analysis, and should be documented, implemented, and monitored. You can identify a summary requirement that was not specified earlier by monitoring code to identify GROUP BY clauses used commonly in SQL statements. A well-designed set of summary tables improves query performance by allowing queries direct access to precomputed summaries and predefined views of data. ..................................................................................................................................................... Data Warehousing Fundamentals 7-29 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Summary Tables Example SALES FACTS Sales$ Region Month 10,000 North Jan 99 12,000 South Feb 99 11,000 North Jan 99 15,000 West Mar 99 18,000 South Feb 99 20,000 North Jan 99 10,000 East Jan 99 2,000 West Mar 99 SALES BY MONTH/REGION Month Region Tot_Sales$ Jan 99 North 41,000 Jan 99 East 10,000 Feb 99 South 40,000 Mar 99 West 17,000 SALES BY MONTH Month Tot_Sales Jan 99 51,000 Feb 99 40,000 Mar 99 17,000 Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-30 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Summary Tables Example Assume a banking scenario. Simple cumulative data gives the total of deposit transactions, summarized in the warehouse, for that day and every other day thereafter. With this method there is no loss of detail, but a lot of processing is required when querying data. Rolling summarized data brings in daily totals for the first seven days. On day eight, the first seven days are totaled and stored as a weekly record. At the end of the month, weekly records are added together to create a monthly record. You reset weekly and monthly records (slots) to zero at appropriate points. With this method there is less processing required when querying data, but the detail is lost. Note: Summary data is also referred to as aggregated data, aggregated facts, or aggregated detail. Summary Table Management The requirement for summary tables may change over time, as what constitutes a popular query changes. Queries may be seasonal, for example, you may have specific queries for spring, summer, autumn, and winter. The query management process should be able to identify the summaries that are used, the summaries that need to be created, and the summaries that may be removed. ..................................................................................................................................................... Data Warehousing Fundamentals 7-31 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Summary Management in Oracle8i Sales Sales summary Region State City Product Time Summary advisor Summary usage Space requirements Summary recommendations Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-32 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Summary Management in Oracle8i Oracle8i summary management features includes three major components: • Query-rewrite capabilities • Mechanisms for maintaining summary tables, including incremental updates • Advisory capabilities that help the warehouse administrator create and delete summaries, based on usage Summary Advisor Oracle 8i summary advisor offers the following information: • Summary usage: such as the number of times a rewrite was made to use a summary, the space used by a summary, and a cost-benefit ratio for each summary. • Summary recommendations: such as creation, retention and dropping of summaries. • Space requirements: based on queries for possible summaries. Materialized Views Summaries are stored in materialized views. While creating materialized views, you can specify storage options to control the size and location of the views. Query Rewrite The Oracle8i cost-based optimizer may use a summary to satisfy a query on the base table (SALES). The process of transforming a query to access materialized views, such as the query using the SALES table in the example, is called a query rewrite. If the SALES table consisted of several million rows and the materialized view contains a few thousand rows, the query will execute very much faster. Query rewrite is the key benefit enabled by materialized views. ..................................................................................................................................................... Data Warehousing Fundamentals 7-33 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Using Time in the Data Warehouse Copyright  Oracle Corporation, 1999. All rights reserved. 23 The Time Dimension • • Time is critical to the data warehouse. A consistent representation of time is required for extensibility. Sales fact Time dimension Where should the element of time be stored? Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-34 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Using Time in the Data Warehouse Though it may seem obvious, real-life aggregations of time can be quite complex. Which weeks roll up to which quarters? Is the first quarter the calendar months of January, February, and March, or the first 13 weeks of the year that begin on Monday? Some causes for nonstandardization are: • Some countries start the work week on Mondays, others on Sunday. • Weeks do not cleanly roll up to years, because a calendar year is one day longer than 52 weeks (one day longer in leap years). • There are differences between calendar and fiscal periods. Consider a warehouse that includes data multiple organizations, each with its own calendars. • Holidays are not the same for all organizations and all locations. Representing time is critical in the data warehouse. You may decide to store multiple hierarchies in the data warehouse to satisfy the varied definitions of units of time. If you are using external data, you may find that you create a hierarchy or translation table simply to be able to integrate the data. Matching the granularity of time defined in external data to the time dimension in your own warehouse may be quite difficult. The Time Dimension Because online transaction data, typically the source data for the warehouse, does not have a time element, you apply an element of time in the extraction, transformation, and transportation process. For example, you might assign a week identifier to all the airline tickets that sold within that week. The transaction may not have a time or date stamp on it, but you know what date the sale has occurred by the generation of the transaction file. The dimension of time is most critical to the data warehouse. A consistent representation of time is required for extensibility. Storing the Time Dimension Typically there is a time dimension table in the data warehouse although time elements may be stored on the fact table. Before deciding where to store time, you must consider the following: • Almost every data warehouse has a time dimension. • Organizations use a variety of time periods for data analysis. • A row whose key is an SQL date may be populated with additional time qualifiers needed to perform business analysis, such as workday, fiscal period, and special events. ..................................................................................................................................................... Data Warehousing Fundamentals 7-35 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Creating the Physical Model Phase 4: Creating the Physical Model Translate the dimensional design to a physical model for implementation Define storage strategy for tables and indexes Perform database sizing Define initial indexing strategy Define partitioning strategy Update metadata document with physical information Copyright  Oracle Corporation, 1999. All rights reserved. Physical Model Design Tasks • • • • • • • • Define naming and database standards Perform database sizing Design tablespaces Develop initial indexing strategy Develop data partition strategy Define storage parameters Set initialization parameters Use parallel processing Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-36 Data Warehousing Fundamentals Phase Two: Creating the Dimensional Model ..................................................................................................................................................... Creating the Physical Model The physical model resides in the relational database server (RDBMS). You need to ensure that each object stored (primarily tables) is held in the appropriate manner and contains all the necessary indexes to ensure optimal performance. There are other considerations that you should bear in mind for performance, such as data partitioning. Dimensional Model to Physical Model The mapping of the dimensional model to the physical elements is accomplished by performing the following to the base dimensional model: • Add the format such as data types and lengths to the attributes of each entity. • Define storage strategy for tables and indexes. • Perform database sizing. • Define the initial indexing strategy. • Define partitioning strategy. • Update metadata document. Physical Model Design Tasks A good physical model is often the difference between a data warehouse success or failure. The design of the physical model builds on the logical model, adding indexes, referential integrity, and physical storage characteristics. Transforming the base dimensional data model into the physical model includes: • Defining naming and database standards • Performing an initial sizing for the data warehouse database • Designing tablespaces • Defining an initial indexing strategy such as primary, unique, nonunique, and bitmapped for loading programs and end-user access (It may include dropping and re-creating the indexes before and after batch load routines.) • Using partitioning to split table and index data into smaller, more manageable chunks • Determining where to place database objects on disk such as disk mapping, striping, or RAID • Setting initialization parameters • Using parallel processing ..................................................................................................................................................... Data Warehousing Fundamentals 7-37 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Using Data Modeling Tools • Tools with a GUI enable definition, modeling, and reporting • Avoid a mix of modeling techniques caused by: – Development pressures – Developers with lack of knowledge – No strategy • • • Spreadsheets Determine a strategy Write and publish formally CASE tools Paper and pencil Make available electronically Copyright  Oracle Corporation, 1999. All rights reserved. GUI Tool Interface Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-38 Data Warehousing Fundamentals Data Modeling Tools ..................................................................................................................................................... Data Modeling Tools You can model the warehouse database by using tools that provide a GUI for: • Entering metadata definitions of facts, dimensions, hierarchies, and relationships • Drawing diagrams of star schemas containing the facts and dimensions • Documenting business requirements • Defining integrity rules and constraints • Generating reports about your metadata definitions Techniques and Considerations Avoid implementing your data warehouse using a mixture of techniques or models. This mixture is often caused by: • The pressure on development; a combination of all previous models is considered a quick approach • Unknowledgeable or untrained designers • Lack of a coherent and available strategy Determine a strict modeling strategy, and publish the approved strategy formally throughout the business subject areas. Consider establishing a data warehouse group to write and maintain all standards and procedures, or to adapt existing standards and procedures to accommodate data warehousing. The documents should be made available electronically (on the Web, for example) and placed in a central repository. GUI Data Modeling Tools WTI Partner Logic Works (see note below) Micro Strategy Oracle Prism Solutions, Inc. Smart Corporation Product Erwin DSS Architect and DSS Agent Designer Data Mart Designer Inmon Generic Data Models Smart DB Workbench These tools are also referred to as computer aided software engineering (CASE) tools. Disregarding these tools, many warehouse implementers simply use spreadsheets or paper and pencil to model their designs and document the metadata. Note: Logic Works was acquired by Platinum, which in turn was acquired by Computer Associates. ..................................................................................................................................................... Data Warehousing Fundamentals 7-39 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Summary This lesson discussed the following topics: • Creating a business model • • • Creating a dimensional model Modeling the summaries Select among business processes Creating a physical model Business model Dimensional model Physical model Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-40 Data Warehousing Fundamentals Summary ..................................................................................................................................................... Summary In this lesson, you explored one process for modeling the warehouse database. This lesson discussed the following topics: • Creating a business model driven by business processes • Creating a logical dimensional model containing a central fact characterized by several dimensions • Modeling the summaries needed for end-user analysis • Translating the logical model to a physical model Note: Oracle offers a two-day, instructor-led course entitled Data Warehouse Database Design. That course uses a case study to teach comprehensive database design, whereas this lesson provided a high-level overview. ..................................................................................................................................................... Data Warehousing Fundamentals 7-41 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... Practice 7-1 Overview This practice covers the following topics: • • • Specifying true or false to a series of statements Completing a series of sentences accurately Practicing identifying a simple business model Copyright  Oracle Corporation, 1999. All rights reserved. ..................................................................................................................................................... 7-42 Data Warehousing Fundamentals Practice 7-1 ..................................................................................................................................................... Practice 7-1 1 Identify whether the following statements are true or false. Question The business model is a logical representation of selected business processes. The star model is normalized. The snowflake model is denormalized. All warehouses must have a time dimension. In a warehouse environment, data loading performance is less important than query performance. True False 2 Complete these sentences. Access to data in a _________ table is faster than calculating aggregates at the time of query execution. b The data warehouse model contains ____ tables that comprise the measures of the business. c Dimensions are denormalized in a _______ model. d A common guideline is to define granularity at one level ________ than currently used by end users. 3 Practice identifying a simple business model. Pair up with a partner and take turns interviewing each other to sketch a simple business model. a Ask your partner to list several of the most important business processes in his or her organization. b Ask him or her to prioritize a single business process that would be easiest to model and deliver the best return on investment in a short time as a data warehouse project. c For the chosen business process, help your partner identify one or two business measures and dimensions that give meaning to those measures. a ..................................................................................................................................................... Data Warehousing Fundamentals 7-43 Lesson 7: Modeling the Data Warehouse ..................................................................................................................................................... ..................................................................................................................................................... 7-44 Data Warehousing Fundamentals 8 ................................. Choosing a Computing Architecture Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Overview Choosing Choosing aa Computing Computing Architecture Architecture Defining DW Concepts & Terminology Planning for a Successful Warehouse Meeting a Business Need Planning Warehouse Storage Modeling the Data Warehouse ETT (Building the Warehouse) Analyzing User Query Needs Supporting End User Access Managing the Data Warehouse Project Management (Methodology, Maintaining Metadata) Copyright  Oracle Corporation, 1999. All rights reserved. ® Objectives After completing this lesson, you should be able to do the following: • • • • Discuss the architectural requirements for the data warehouse Consider the benefits of each hardware architecture Describe the database server characteristics required in a warehouse environment Review the importance of parallelism for the data warehouse environment Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-2 Data Warehousing Fundamentals Overview ..................................................................................................................................................... Overview The previous lesson covered modeling the data warehouse. This lesson discusses choosing a computing architecture for the warehouse. Note that the “Choosing a Computing Architecture” block is highlighted in the course road map on the facing page. Specifically, this lesson examines the computer architectures that commonly support data warehouses. The benefits of each hardware architecture and reasons for using distributed warehouses are examined. Students examine the technology requirements of a database server for warehousing. Objectives After completing this lesson, you should be able to do the following: • Discuss the architectural requirements for the data warehouse • Consider the benefits of each hardware architecture • Describe the database server characteristics required in a warehouse environment • Review the importance of parallelism for the data warehouse environment ..................................................................................................................................................... Data Warehousing Fundamentals 8-3 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Architectural Requirements Scalability Manageability Availability Extensibility Integration Flexibility User Business Budget Technology Copyright  Oracle Corporation, 1999. All rights reserved. ® Strategy for Architecture Definition • • • • • • • • • • Obtain existing architecture plans Obtain existing capacity plans Document existing interfaces Prepare capacity plan Prepare technical architecture Document operating system requirements Develop recovery plans Develop security and control plans Create architecture Create technical risk assessment Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-4 Data Warehousing Fundamentals Architecture Requirements ..................................................................................................................................................... Architecture Requirements The data warehouse tenets described on the top-left slide are perceived to be the primary tenets in a data warehouse environment—that is, the architecture must be scalable, manageable, available, extensible, flexible, and integrated. This list can be extended to include tunable, reliable, robust, supportable, and recoverable. Making Compromises Compromises may affect the task of balancing user needs and business requirements if budgetary constraints restrain your choices or if technical difficulties are too challenging. The architecture requirements definition must be considered at an early stage, in parallel with the user requirements. Only at this time can successful choices be made. Architecture requirements definition is a specific phase of the Oracle Data Warehouse Method (DWM). Strategy for Architecture Definition You must have a definitive strategy that employs identified and proven technology. Using DWM as a foundation for this discussion, consider some of the tasks you need to perform in the early stages when planning the hardware architecture and surrounding environment. • Obtain existing plans and outlines of the current technical architecture for the environments that will supply the warehouse. • Obtain existing capacity plans for the current environments. • Document existing data warehouse interfaces, and document enterprise data warehouse interface requirements. • Prepare enterprise data warehouse capacity plan. • Prepare enterprise data warehouse technical architecture. • Document enterprise data warehouse system operational requirements. • Develop recovery and fallback strategy. • Develop security and control strategy. • Create enterprise data warehouse architecture. • Create technical risk assessment. All of these tasks are mentioned in this lesson, but not in the order identified above. ..................................................................................................................................................... Data Warehousing Fundamentals 8-5 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Hardware Architectures Involve all experts • • • New technology Old technology Networking Copyright  Oracle Corporation, 1999. All rights reserved. ® Hardware Architectures • • • • • • • • Robust • VLM Available • 64-bit Reliable • Connective Extensible • Open Scalable Supportable Recoverable Parallel Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-6 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... The Hardware Architecture Consider the hardware architectures first. This is an area of the plan where a number of people including the data warehousing IT team members must be involved. This includes the current database administrators of the operational systems, who have the experience and expertise of current systems and performance and who can also provide useful input regarding the existing architectures and interfaces. You must ensure that networking staff are involved as well. It is a critical issue for processes such as ETT and user access. Hardware Requirements The choice of hardware architecture is critical to the success of the data warehouse and its infrastructure. Warehouses require hardware architectures that are: • Robust • Available • Reliable • Flexible • Extensible • Scalable • Supportable • Recoverable • Parallel In addition, the architecture should • Have a very large memory (VLM) capability • Be able to use 64-bit addressing • Be connective and conform to open system standards Note: Do not confuse the term database server with a file server on a local area network or any other server. For our purposes, the term database server describes the Relational Database Management System (RDBMS) or Database Management System (DBMS). ..................................................................................................................................................... Data Warehousing Fundamentals 8-7 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Hardware Architectures • • • • • SMP Cluster Hybrids use SMP and MPP MPP NUMA ® Copyright  Oracle Corporation, 1999. All rights reserved. Evaluation Criteria Determine the platform for your needs SMP Clusters NUMA MPP High Low Scalability High Maturity Copyright  Oracle Corporation, 1999. All rights reserved. Low ® ..................................................................................................................................................... 8-8 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Hardware Requirements (continued) Today, hardware architectures support a number of different configurations that are useful for data warehousing and are more cost-effective than hardware architectures previously available: • Symmetric multiprocessing (SMP): Symmetric multiprocessing architectures are the oldest of the technologies and have a proven track record. • Cluster: Cluster and massively parallel processing architectures are comparatively new but are more scalable and provide a lot of power. • Massively parallel processing (MPP) and nonuniform memory access (NUMA): NUMA is an even more recent innovation that gives you the scalability of an MPP environment and the manageability of an SMP environment. Some architectures are a hybrid, employing both SMP and MPP capabilities. Evaluation Criteria By specifying the hardware requirements early on in the development of the warehouse, you have enough lead time to acquire and test the chosen components. Determining the platform depends upon a number of factors, and the different architectures have advantages and disadvantages that you must evaluate before making a final decision: • A symmetric multiprocessing architecture may be sufficient if you have a small database, can afford a longer response time, and have problems that are not complex. Problem complexity is determined by the number of users, the type of calculations, and the types of queries that the system must handle. • The larger your database, the more complex your problems, and the shorter the required response time, the closer you are to specifying a massively parallel processing system. ..................................................................................................................................................... Data Warehousing Fundamentals 8-9 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Parallel Processing • • Parallel daily operations Operating system Shared resources Hardware – Memory – Disk Application Database – Nothing • Loosely or tightly coupled Copyright  Oracle Corporation, 1999. All rights reserved. ® Making the Right Choice • • Requirements differ from operational systems Benchmark – Available from vendors – Develop your own – Use realistic queries • Scalability important Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-10 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Parallel Processing Hardware architectures that contain parallel processors are often categorized according to the resources they share. • Memory: SMP machines are often described as tightly coupled. • Disk: Clustered architectures are often described as loosely coupled. • Nothing: MPP machines are described as loosely or tightly coupled, according to the way communication is accomplished among nodes. NUMA is an SMP architecture with loosely coupled memory using uniform and nonuniform memory access. Making the Right Choice How do you know which architecture to choose? Operational environments do not map directly to the way the warehouse operates, with its unpredictable workloads and scalability requirements. The only realistic way to determine the interaction between your data warehouse database and the hardware configuration is to perform full-scale testing. Of course you may not be able to achieve this. When benchmarking, use real user queries against volumes of data that mimic the volumes anticipated in the warehouse. If you are unhappy with vendor benchmarks, consider developing your own. This is going to add to the cost of development. However, costs are high for a warehouse implementation and you may find the amount spent on your own benchmark worthwhile in the long term. Because scalability is probably one of the most important requirements, you might tend toward the choice of an SMP device. ..................................................................................................................................................... Data Warehousing Fundamentals 8-11 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... SMP • • • Communication by shared memory Disk controllers accessible to all CPUs Proven technology CPU CPU CPU CPU Common bus Shared memory Shared disks ® Copyright  Oracle Corporation, 1999. All rights reserved. SMP • Benefits: CPU CPU CPU CPU – High concurrency – Workload balancing – Moderate scalability Shared memory – Easy administration • Limitations: – Memory (cluster for improvements) – Bandwidth Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-12 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Symmetric Multiprocessing A symmetric multiprocessing (SMP) machine comprises a set of CPUs that share memory. It has a shared everything architecture: • Each CPU has full access to the shared memory through a common bus. • Communication between the CPUs uses the shared memory. • Disk controllers are accessible to all CPUs. This is a proven technology, particularly in the data warehousing environment. Note: A bus is a cable or circuit used to transfer data or electrical signals among devices. Benefits • High concurrency • Workload balancing • Moderate scalability Is not as scalable as MPP or NUMA. • Easier to administer than a cluster environment, with proven tools Limitations • Available memory may be limited—this can be enhanced by clustering • Bandwidth for CPU to CPU communication and I/O and bus communication Note: SMP machines are often nodes in a cluster. Multiple SMP nodes can be used with certain vendors’ architectures—DEC, Pyramid, Sequent, Sun, SparcServer— where disk is shared among the multiple nodes. Some warehouse sites are exploring the evolving concept of loaning excess memory or processing capacity among applications or hardware. Some SMP vendors allow you to scale to MPP without losing your SMP box. You simply add interconnect software and associated technology. ..................................................................................................................................................... Data Warehousing Fundamentals 8-13 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... NUMA Loosely coupled shared memory CPU CPU CPU CPU CPU CPU Shared bus Shared memory Nonuniform memory access Disk Shared memory Disk Copyright  Oracle Corporation, 1999. All rights reserved. ® NUMA • Benefits: – Fully scalable, incremental additions to disk, CPU, and bandwidth – Performs better than MPP – Suited for Oracle server • Limitations: – The technology is new and less proven – You need new tools for easy system management – NUMA is more expensive than SMP Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-14 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Nonuniform Memory Shared memory systems are systems with loosely coupled memory. The shared memory may be accessed by using uniform memory access from CPUs or by nonuniform memory access (NUMA). The Oracle Parallel Server can work with either form of memory access, but NUMA is a more costly form of access and synchronization than uniform memory access. While any CPU can access the memory, it is more costly for remote nodes. Benefits • A fully scalable architecture that can overcome some of the scalability problems of SMP • A very scalable parallel architecture, and therefore it is possible to add disk, CPU, and bandwidth incrementally to any level • A system that performs better than an MPP system where there are ad hoc or mixed workloads • Suited to the Oracle server Limitations • The technology is new and less proven. • You need new tools for easy system management. • NUMA is more expensive than SMP. ..................................................................................................................................................... Data Warehousing Fundamentals 8-15 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Clusters Node 1 Node 2 Node 3 CPU CPU CPU CPU CPU CPU CPU CPU CPU Shared memory Shared memory Common high-speed bus Shared memory Common high-speed bus Shared disks Copyright  Oracle Corporation, 1999. All rights reserved. ® Clusters • • • • • • • Shared disk, loosely coupled Dedicated memory CPU CPU CPU CPU CPU CPU CPU CPU CPU High-speed bus Shared Shared Shared Shared resources memory memory memory SMP node Benefits: – High availability – Single database concept, incremental growth Limitations: – Scalability, internode synchronization needed – Operating system overhead Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-16 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Clusters Shared disk, loosely coupled systems have the following characteristics: • Each node consists of one or more CPUs and associated dedicated memory. • Memory is not shared between nodes. • Communication occurs over a high-speed bus. • Each node has access to all of the disks and other resources. • An SMP machine can be a node, if the hardware supports it. Benefits • High availability; all data is accessible even if one node dies • The concept of one database, which is an advantage over shared nothing systems such as MPP • Incremental growth Limitations • Bandwidth of the high speed bus limits the scalability of the system. • Internode synchronization is required. Each node has a data cache; cache consistency must be maintained for the locking mechanisms to work effectively. • The shared disk software gives an overhead on the operating system. ..................................................................................................................................................... Data Warehousing Fundamentals 8-17 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... MPP CPU CPU CPU CPU Memory Memory Memory Memory Disk Disk Disk Disk Copyright  Oracle Corporation, 1999. All rights reserved. ® MPP • • • • • • • A shared nothing architecture Many nodes Fast access Exclusive memory on a node Low cost per node Scalable nCUBE configuration Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-18 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Massively Parallel Processing The massively parallel (MPP) architecture is concerned with disk access, rather than memory access, and works well with operating systems that provide transparent disk access. You can scale the configuration up by adding more CPUs. If a table or database is located on a disk, access depends entirely on the CPU that owns it. If the CPU fails, the data cannot be accessed, regardless of how many other CPUs are running, unless logical pointers are established to alternative CPUs. Typically, massively parallel architectures have the following characteristics: • Are very fast compared with SMP and cluster architectures • Support a few to thousands of nodes • Provide fast access between nodes • Have associated nonshared memory associated with each node • Have a low cost per node Massively parallel technology is comparatively new and not proven to the same extent as SMP and cluster technology. nCUBE Arrangements Nodes may be organized on a grid arrangement if using nCUBE. Multiprocessor designs provide a scalable architecture that let you increase performance easily as your needs grow. The key to a multiprocessor system is the interconnect—the mechanism that allows the processors to communicate and cooperate. In an nCUBE system, processors are connected in a multidimensional cube called a hypercube, providing the fastest and densest communications network available. The hypercube network is organized so that connections among processors form cubes. As more processors are added, the cube grows to larger dimensions. The nCUBE system is scalable to hundreds of processors. ..................................................................................................................................................... Data Warehousing Fundamentals 8-19 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... MPP Benefits • • • • • Unlimited incremental growth Very scalable Fast access Low cost per node Good for DSS CPU CPU CPU CPU Memory Memory Memory Memory Disk Disk Disk Disk ® Copyright  Oracle Corporation, 1999. All rights reserved. MPP Limitations • • • • • • Rigid partitioning Cache consistency Restricted disk access High memory cost per node High management burden Careful data placement CPU CPU CPU CPU Memory Memory Memory Memory Disk Disk Disk Disk Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-20 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Massively Parallel Processing (continued) Benefits • Practically unlimited, and incremental growth • Very scalable (given careful data placement) • Fast access between nodes • Low cost per node (each node is an inexpensive processor) Each node has its own devices, but, in case of failure, other nodes can access the devices of the failed node (on most systems); failure may be local to the node. • Good for DSS and read-only databases Limitations • Many database servers (not necessary with Oracle) require rigid data partitioning for parallelism and scalability. • Cache consistency must be maintained. • Disk access is restricted. • The memory cost per node is high. • The management burden is high. • Careful data placement is required for scalability. ..................................................................................................................................................... Data Warehousing Fundamentals 8-21 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Windows NT Architecture based on the client-server model • Benefits: – Include built-in Web services – Scalability • – Ease of management and control Limitations: – Not as secure – Cannot execute programs remotely – Lack linear scalability beyond four processors – Addressing space for applications is limited to two gigabytes Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-22 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Windows NT The architecture for Windows NT is based on the client-server model. The approach divides the operating system into an executive running in kernel mode and several server processes, each running in user mode. Each server process implements a unique operating system environment. Benefits • Windows NT server operating system includes built-in Web services that provide a complete, integrated intranet solution. • Windows NT offers scalability improvements of up to 33 percent, yielding more linear scalability on machines with eight or more processors. • Ease of management and control with user profiles and system policies enable system administrators to easily manage user desktops, including the ability to control access to the network and to desktop resources as well as support for users accessing multiple workstations. Limitations • Windows NT is not as secure as other operating systems such as UNIX. • On other operating systems, you can execute programs on your machine remotely, but you cannot do this with Windows NT. • Although Windows NT can support SMP with up to 32 processors, Windows NT has been criticized for its lack of linear scalability beyond four processors. • Addressing space limits Windows NT applications to two gigabytes. This is insufficient for large data warehouses. ..................................................................................................................................................... Data Warehousing Fundamentals 8-23 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Architectural Tiers • Tiered structures: – Modular – Logical separation • Distributed structures: – Two-tier – Three-tier – Four-tier (and more) Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-24 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Architectural Tiers Architectures can be the simple two-tier type, the more complex three-tier type, or if Web applications are involved up to a four-tier type. This enables a useful division of labor for specific tasks and processes, and can assist and complement the network setup. Two-Tier Architecture A simple two-tier architecture involves: • A mainframe CPU, such as IBM, with source data, which is copied and extracted periodically to • A smaller server, such as Windows NT A query and analysis tool is then provided for the NT environment. This structure does not fit well into the kind of enterprisewide environments discussed so far. Three-tier architectures are more common. Three-Tier Architecture A three-tier architecture employs a separate middleware layer for data access and translation. • Tier 1 hosts the production applications on a mainframe or midrange system and is devoted to real-time production level data processing. • Tier 2 comprises a departmental server resident with the warehouse users, for example, a UNIX workstation or NT server, which is optimized for query processing, analysis, and reporting. • Tier 3 comprises the desktop and handles reporting, analysis, and graphical data presentation. PCs are connected on a LAN. The three-tier architecture is more effective than two-tier architecture because the first tier is devoted to operational processing, the second to department-level query processing and analysis, and the third to desktop data presentation. Four-Tier and Greater Architecture This architecture is similar in structure to the three tiers, with the addition of a Web-based tier. ..................................................................................................................................................... Data Warehousing Fundamentals 8-25 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Middleware Technologies for integration Gateway Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-26 Data Warehousing Fundamentals The Hardware Architecture ..................................................................................................................................................... Middleware Middleware is a term that is used to describe technologies that allow you to integrate multiple server technologies together in a seamless manner. Middleware tools are common in today’s computing environment. Oracle gateway technology is one example of middleware available off the shelf. In a multitier data warehousing environment with Internet access, middleware is becoming increasingly redefined and refined. ..................................................................................................................................................... Data Warehousing Fundamentals 8-27 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Database Server Requirements • • • • • • • • Robust Available Reliable Extensible Scalable Supportable Recoverable Parallel Copyright  Oracle Corporation, 1999. All rights reserved. ® Parallelism • • • • • • • Database Query Load Index Sort Backup Recovery Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-28 Data Warehousing Fundamentals Database Server Requirements ..................................................................................................................................................... Database Server Requirements The database server (DBMS) must be: • Robust • Available • Reliable • Flexible • Extensible • Scalable • Supportable • Recoverable • Parallel Parallelism The driving force behind the warehouse implementation is the needs of the end users who require access to the information. The database environment must handle all operational tasks and processes quickly and efficiently. Of course parallel capabilities minimize the time taken to perform all the major functions of the warehouse and maximize availability. As you have seen parallelism at all levels is becoming mandatory for warehouses: • Database (server) • Query • Load • Index • Sort • Backup • Recovery ..................................................................................................................................................... Data Warehousing Fundamentals 8-29 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Further Considerations • • • • • • • Optimization strategy Partitioning strategy Summarization strategy Indexing techniques Hardware and software scalability Availability Administration ® Copyright  Oracle Corporation, 1999. All rights reserved. Server Environments Operational servers Warehouse servers • Open DBMS • Open DBMS • Network, relational, • Relational hierarchical • General purpose and warehouse-specific • Mainframe proprietary DBMS • Oracle, IMS, DB2, VSAM, Rdb, Non Stop SQL, RMS Data mart servers • Open DBMS • Relational and multidimensional • General purpose DBMS • Oracle, Informix, Sybase, IBM DB2, NCR/AT&T Teradata Red Brick and warehouse specific DBMS • Oracle, Oracle Copyright  Oracle Corporation, 1999. All rights reserved. Express, Arbor Essbase, MS SQL Server, NT ® ..................................................................................................................................................... 8-30 Data Warehousing Fundamentals Database Server Requirements ..................................................................................................................................................... Further Considerations Parallelism is not the only consideration; you must also consider the following: • The optimization strategy, particularly star query techniques employed with star and snowflake structures (Today’s servers enable you to optimize data access in many different ways.) • The partitioning strategy • Summarization strategies, to ensure that the overhead of creating summaries does not affect the load • Indexing techniques, in particular, bitmap indexes • Hardware and software scalability • Availability of the warehouse • The system administration, which must easily manage the entire infrastructure Server Environments Many different database servers and hardware architectures can be employed for a warehouse solution. It is generally assumed that data warehouse database technology means relational technology. • Operational Servers: Open, mainframe proprietary database servers (whether network database server, hierarchical database server, or relational database server), such as Oracle, IMS, DB2, DB2/PE, VSAM, Rdb, Non-Stop, SQL, or RMS. • Warehouse Servers: Open (usually relational) database servers that may be warehouse specific or general purpose, such as Oracle, Informix, Adabas D, OpenIngres, or Red Brick. • Data Mart Servers: Relational, multidimensional (OLAP) databases, or both; they may be warehouse specific or general purpose, such as Oracle, Oracle Express, Arbor Essbase, MS SQL Server, and NT based environments. ..................................................................................................................................................... Data Warehousing Fundamentals 8-31 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Parallel Processing A large task broken into smaller tasks: • • Concurrent execution One or more processors Elapsed time Not parallel Processor 1 Parallel Processor 1 Processor 2 Processor 3 Processor 4 ® Copyright  Oracle Corporation, 1999. All rights reserved. Parallel Database • • Increased speed Improved scalability Parallel Processor 1 Processor 2 Processor 3 Processor 4 • Performance gains – Availability – Flexibility – More users Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-32 Data Warehousing Fundamentals Parallel Processing ..................................................................................................................................................... Parallel Processing A parallel processor takes a task (usually a large task) and divides it into smaller tasks that can be executed concurrently on one or more nodes (separate processors). As a result, a large task requested by a single user completes more quickly. Before examining the individual parallel features, consider the parallel database. Parallel Database A parallel database takes advantage of architectures that share access to data, software, and peripheral devices by running multiple instances that share a single physical database. This type of processing has two key features: • Increased speed: The server can perform the same task in less time • Improved scalability: The ability to perform a task many times larger, on a system many times larger, without any performance degradation These key features give you the following benefits: • Higher performance • Greater availability • Greater flexibility • Greater accessibility to online users All of these features directly benefit the warehouse and are supported by the Oracle7, Oracle8, and Oracle8i Server. ..................................................................................................................................................... Data Warehousing Fundamentals 8-33 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Parallel Query SQL code split among server processes. SubQuery Query SubQuery SubQuery ® Copyright  Oracle Corporation, 1999. All rights reserved. Parallel Load Bypass SQL processing to speed throughput. Jan 98 Feb 98 Mar 98 Order table Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-34 Data Warehousing Fundamentals Parallel Processing ..................................................................................................................................................... Parallel Query Most database servers today support parallel query. Specifically, the Oracle Server parallel query option divides the work of processing a single SQL statement among multiple query server processes. In some applications, particularly decision support systems, an individual query may use vast amounts of CPU resource and disk I/O. The server parallelizes individual queries into units of work that can be processed simultaneously. Parallel Load Parallelism can dramatically speed up loading data. Database servers can bypass standard SQL processing (that is, data manipulation language commands, such as INSERT), and the data is loaded directly into the database tables. ..................................................................................................................................................... Data Warehousing Fundamentals 8-35 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Parallel Processing • • • Index Reduces the time to create Sort Allocates memory in cache efficiently Backup Runs simultaneously from any node – Offline – Online • • Recovery Runs simultaneously from redo logs Summaries Uses the CREATE TABLE AS SELECT statement Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-36 Data Warehousing Fundamentals Parallel Processing ..................................................................................................................................................... Parallel Index Creating an index in parallel decreases the time required to create and reconfigure a warehouse. Many indexes exist in the warehouse database. Nearly every attribute on dimension tables and composite key values on the fact table are indexed. Indexes take up a lot of space in the warehouse, and you must consider the direct access storage device (DASD) needed for indexes as well as fact and dimension tables. Parallel Sort Sorting is an intensive task that requires a substantial amount of memory. If you are working in a parallel environment, sort areas are allocated more efficiently to reduce serialization and cross-instance pinging. Sort space is cached in memory (in the Oracle server this is in the System Global Area). Parallel Backup With parallel operations, backups can be performed simultaneously from any node of a parallel server. • Online backups enable the database to be backed up while active, allowing users continuous access. • Offline backups enable the database to be backed up while shutdown, preventing user access. Parallel Recovery The goal of parallel recovery is to employ I/O parallelism to reduce the elapsed time required to perform crash recovery, instance recovery, or media failure recovery. The server uses one process to read files sequentially and dispatch redo information to several recovery processes to apply the changes from the log files to the data files. Parallel Table Creation With the Oracle7, Oracle8, and Oracle8i Server you can create tables in a parallel manner using the CREATE TABLE AS SELECT (CTAS) statement. ..................................................................................................................................................... Data Warehousing Fundamentals 8-37 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Summary This lesson discussed the following topics: • Outlining the basic architecture requirements for a warehouse • Highlighting the benefits and limitations of all the different hardware architectures Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-38 Data Warehousing Fundamentals Summary ..................................................................................................................................................... Summary This lesson discussed the following topics: • Outlining the basic architecture requirements for a warehouse • Highlighting the benefits and limitations of all the different hardware architectures ..................................................................................................................................................... Data Warehousing Fundamentals 8-39 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... Practice 8-1 Overview This practice covers the following topics: • Defining, stating benefits and limitations of SMP, NUMA, clusters, and MPP • Defining parallelism and explaining its importance to the data warehouse Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 8-40 Data Warehousing Fundamentals Practice 8-1 ..................................................................................................................................................... Practice 8-1 1 Form into small groups, and consider each of the following hardware architectures. With your books closed, create a short definition for each architecture. Each answer should include the benefits and limitations of each architecture. Architecture SMP Definition Benefits Limitations NUMA Clusters MPP 2 Staying in your small group, discuss each of the following questions. What is parallelism? b Why is it important to the data warehouse? a ..................................................................................................................................................... Data Warehousing Fundamentals 8-41 Lesson 8: Choosing a Computing Architecture ..................................................................................................................................................... ..................................................................................................................................................... 8-42 Data Warehousing Fundamentals 9 ................................. Planning Warehouse Storage Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Overview Defining DW Concepts & Terminology Planning for a Successful Warehouse Meeting a Business Need Choosing a Computing Architecture Planning Planning Warehouse Warehouse Storage Storage Modeling the Data Warehouse ETT (Building the Warehouse) Analyzing User Query Needs Supporting End User Access Managing the Data Warehouse Project Management (Methodology, Maintaining Metadata) Copyright  Oracle Corporation, 1999. All rights reserved. ® Objectives After completing this lesson, you should be able to do the following: • • Discuss different partitioning methods and indexing methods Consider the benefits and limitations of different RAID levels in protecting the database Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-2 Data Warehousing Fundamentals Overview ..................................................................................................................................................... Overview The previous lesson covered choosing a computing architecture. This lesson discusses planning warehouse storage. Note that the “Planning Warehouse Storage” block is highlighted in the course road map on the facing page. Specifically, this lesson examines the database setup and management issues such as partitioning, indexing, and ways to protect your database. Objectives After completing this lesson, you should be able to do the following: • Discuss different partitioning methods and types of indexes • Consider the benefits and limitations of different RAID levels in protecting the database ..................................................................................................................................................... Data Warehousing Fundamentals 9-3 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Data Partitioning • Breaking up of data into separate physical units that can be handled independently Add • Ease of: – Restructuring – Reorganization – Removal – Recovery Order table Jan 98 Feb 98 – Monitoring Mar 98 Drop – Management – Archiving – Indexing Other data is not affected Copyright  Oracle Corporation, 1999. All rights reserved. ® Objects to Partition • Tables: – Fact – Dimension • Indexes Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-4 Data Warehousing Fundamentals The Server Data Architecture ..................................................................................................................................................... The Server Data Architecture Data Partitioning Partitioning enables you to break tables down into smaller, more manageable units, thus addressing the problems of supporting large tables and indexes (which are inherent in data warehouses). A large table is broken into many smaller physical tables or views, and then they are pulled together again for query actions that access data from more than one of the tables or views. The data may be partitioned horizontally or vertically. Partitioning helps in the following ways: • Improves the speed of access and data management by eliminating the need to visit both vertical or horizontal partitions during query and backup tasks • Increases the availability by reducing the time to perform all the warehouse management tasks (such as load) and the ability to take one area of the database offline and keep others active You partition fact data to break the large volumes of data up into smaller units. Partitioned data can easily be: • Restructured • Reorganized • Removed • Recovered • Monitored • Managed • Archived • Indexed, with improved sequential data scanning Note: In determining objects to partition, you use partitioning initially on the fact table, because it is the largest and requires the most management and maintenance. However, you can use partitioning on any table in the data warehouse. You should also partition indexes. ..................................................................................................................................................... Data Warehousing Fundamentals 9-5 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Horizontal Partitioning • Table and index data are split by: – Time – Sales region or person – Geography – Organization – Line of business • Candidate columns appear in WHERE clause • Analysis determines requirement Copyright  Oracle Corporation, 1999. All rights reserved. ® Vertical Partitioning You may use vertical partitioning when: • Speed of query and update actions is improved by it • • • Users require access to specific columns Some data is changed infrequently Descriptive dimension text may be better moved away from the dimension itself Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-6 Data Warehousing Fundamentals The Server Data Architecture ..................................................................................................................................................... Data Partitioning (continued) There are two broad categories of partitioning: Horizontal partitioning and vertical partitioning. Horizontal Partitioning Horizontal partitioning is commonly used in warehouse environments because it enables you to store a very large table in smaller tables. It gives the database administrator control over the rows that go into each table. For example, 12 months of data can be stored in 12 tables or views, one for each month. The advantage, when querying data, is that full table scans are reduced. A query that requires information for the month of February merely scans a single table or view of the data. Warehouse partitioning can be based on different criteria, but usually one or more of the following: • Time • Sales region • Sales person • Geographical unit • Organization • Line of business Example: Partitioning by time is most common, because most of the information you need for analysis is based on time periods. Partitioning by time is also effective for loading and archiving tasks. You can insert a new data table into the warehouse for each month, and easily remove (drop) the oldest table. Vertical Partitioning With vertical partitioning, you break tables up on a columnby-column basis. You may use vertical partitioning when: • It would improve the speed of query and update actions. • Users require access to specific columns. It is useful if queries are specifically on a small number of columns rather than a whole row, or you want to control visibility to sensitive data, such as salary figures on a payroll (HR) system. • Some data is changed infrequently. You can keep the infrequently changed data in a separate partition. It is easier to manage data this way, and you can make some of the attributes globally read-only. You can also store less frequently accessed data on CD-ROM and in a carousel or cartridge unit. • Descriptive dimension text may be better moved away from the dimension itself. Initial partitioning strategies are normally used in the first implementation of the warehouse. After use, you often find that analysis and review of performance, users’ query techniques, and data management strategies determine the need for further or alternative partitioning. Continually review the strategy. ..................................................................................................................................................... Data Warehousing Fundamentals 9-7 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Partitioning Methods • • • Range partitioning (Oracle8 and Oracle8i) Hash partitioning (Oracle8i) Composite partitioning (Oracle8i) Range partitioning Hash partitioning Copyright  Oracle Corporation, 1999. All rights reserved. Composite partitioning ® ..................................................................................................................................................... 9-8 Data Warehousing Fundamentals The Server Data Architecture ..................................................................................................................................................... Partitioning Methods The different types of partitioning methods that are available for Oracle8 and Oracle8i are listed below. • Range Partitioning (Oracle8 and Oracle8i) Range partitioning exists since Oracle8. This option supports partitioning data based on ranges of values. Range partitioning guarantees that only data with a particular set of values is contained in each partition. Range partitioning is good for rolling windows of data. • Hash Partitioning (Oracle8i) Hash partitioning is a new feature of Oracle8i. Hash partitioning reduces administrative complexity by providing many of the manageability benefits of partitioning, with minimal configuration effort. When implementing hash partitioning, the administrator simply chooses a partitioning key and the number of partitions. Oracle8i automatically distributes the data evenly across all partitions. Hash partitioning is particularly appropriate for tables that do not have a natural partitioning key. • Composite Partitioning (Oracle8i) Composite partitioning partitions data using the range method and within each partition, subpartitions it, using the hash method. This new type of partitioning, which is available only in Oracle8i, supports historical operations data at the partition level, and parallelism (parallel DML) and data placement at the subpartition level. Composite partition is ideal for both historical data and data placement. Two new partitioning methods introduced in Oracle8i, hash and composite partitioning, offer improvements for tables that do not naturally submit themselves to range partitioning in one or more of the following areas: • Ease of specification • Simplicity of management for support of parallelism • Reduction in skew in the amount of resources required to perform maintenance operations (such as export or backup) on different partitions of a table • Performance by adding support for partitionwise joins and intrapartition parallel data manipulation language (DML) Take better advantage of hierarchical storage management solutions. Benefits of Partitioning A major reason for supporting partitioned objects in Oracle8 and Oracle8i was the dramatic increase in the size of database objects (for example, tables) and the need to: • Reduce downtime (owing to scheduled maintenance and data failures) • Improve performance through partition elimination (it is also called partition pruning) • Improve manageability and ease of configuration ..................................................................................................................................................... Data Warehousing Fundamentals 9-9 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Star Query Optimization Optimum performance with star schema models 1. Dimensions are queried to create a 2. Cartesian product, computed against 3. Smaller reference tables. 4. The result is joined to 5. A fact table to produce a query result. 1 2 3 4 5 = Query Result ® Copyright  Oracle Corporation, 1999. All rights reserved. Star Transformation Key 2 Stat 1002 SF Key 1 Key 2 Market_Table Key 1 Brand 1001 ABC Key 3 Dollars 1001 1002 1003 6000 2001 2002 2003 10000 3001 3002 3003 15200 4001 4002 4003 9526 Key 3 Year Month 1003 1998 March Time_Table Fact_Table Product_Table STAR_TRANSFORMATION_ENABLED Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-10 Data Warehousing Fundamentals The Server Data Architecture ..................................................................................................................................................... Star Query Optimization A star query is a mechanism that provides high levels of performance when querying data in a star or snowflake model (a natural representation for most warehouses). Optimizers that support star query execution can handle the complex joins with a specific execution plan. The star query works by accessing dimensions to create a Cartesian product, which is computed against smaller reference tables. The result is joined to the fact table, which is scanned once to produce the query result. Note: The Oracle server cost based optimizer supports this technique. Star Transformation The star transformation is a cost-based query transformation aimed at executing star queries efficiently. Whereas the star optimization works well for schemas with a small number of dimensions and dense fact tables, the star transformation may be considered as an alternative if any of the following holds true: • The number of dimensions is large. • The fact table is sparse. • There are queries where not all dimension tables have constraining predicates. The STAR_TRANSFORMATION_ENABLED parameter specifies whether a cost-based query transformation is applied to star queries. The default value is TRUE. This parameter can be set dynamically using the ALTER SESSION command. ..................................................................................................................................................... Data Warehousing Fundamentals 9-11 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Indexing Indexing is used because: • It is a huge cost saving, greatly improving performance and scalability • Can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed Copyright  Oracle Corporation, 1999. All rights reserved. ® B-Tree Index • • • Most common type of indexing Used for high cardinality columns Designed for few rows returned Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-12 Data Warehousing Fundamentals The Server Data Architecture ..................................................................................................................................................... Indexing Data By intelligently indexing data in your data warehouse, you can increase both the performance and scalability of your warehouse solution. Using indexes, you can replace a full table scan by a quick read of the index followed by a read of only those disk blocks that contain the rows needed. The types of indexes are described below. B-Tree Indexes This is the most common type of indexing, used for high cardinality columns, and designed for few rows returned. Rather than scanning an entire table to find rows where certain column satisfies a WHERE clause predicate, you instead create a separate index structure on that column. This index structure contains a sorted list of all the actual discrete column values, and each value in the index is associated with a list of pointers to all the rows in the original table that contain that value. The index is stored internally using a binary tree (or B-tree) representation in order to allow the database engine to quickly find any element in the sorted list. Note: Cardinality is defined as the number of distinct key values expressed as a percentage of the number of rows in the table. For example, a million-row index with five distinct values has a low cardinality while a 100-row table with 80 distinct values has a high cardinality. ..................................................................................................................................................... Data Warehousing Fundamentals 9-13 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Bitmap Indexes • • • Provide performance benefits and storage savings Store values as 1s and 0s Use instead of B-tree indexes when: – Tables are large – Columns have relatively low cardinality Bitmap index on product color Blue - 1000100100010010100 Green - 0001010000100100000 Mauve - 0100000011000001001 Gold - 0010001000001000010 Copyright  Oracle Corporation, 1999. All rights reserved. ® Oracle 8 and Oracle8i Index Enhancements • Oracle8 index enhancements: – Partitioned index – Index-organized tables • Oracle8i index enhancements: – Function-based index – New bitmap index improvements – Online index build and rebuild – Descending index – Statistics can be collected when an index is created Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-14 Data Warehousing Fundamentals The Server Data Architecture ..................................................................................................................................................... Bitmap Indexes Bitmap indexes provide substantial performance benefits and storage savings. When a bitmap index is created on a column, a bit stream (ones and zeros) is created for each distinct value in the indexed column. They are useful on low cardinality data. Scanning 1s and 0s is much more efficient than scanning data values. Bitmap indexes are an alternative to normal B-tree indexes in the following situations: • The table is large (millions of rows). • Columns have low cardinality index key values. Oracle8 and Oracle8i Index Enhancements • Partitioned Indexes (Oracle8) You may choose to partition B-tree or bitmap indexes in synch with your table partitioning strategy. These are called local indexes. Indexes may be prefixed (synchronized with the tablespace), nonprefixed (related to columns not in the partition), or global (the index is partitioned differently from the table). • Index Organized Tables (Oracle8) The data for the table is held in the index, and changes to data result only in changes to the index. Access can be by primary or any other key that is a valid prefix of the primary key. Standard SQL is used to access these indexes. Some of the benefits are that they provide faster, key-based access involving exact match or range searches, and storage requirements are reduced because index and key values are stored once and the value of ROWID is not required. • Oracle8i Index Enhancements The following are the index enhancements for Oracle8i: – Function-based index: Allows a warehouse administrator to build an index on a function. A common use of a function-based index is in the creation of caseinsensitive indexes, which can be implemented by creating an index on the uppercase function applied to a character column. – New bitmap index improvements: Reduction in compress or uncompress operations. – Online index build and rebuild: Rebuilding indexes and index-organized tables can be done without locking the table. – Descending index: Indexes in Oracle8i can be stored in descending order of key values. ..................................................................................................................................................... Data Warehousing Fundamentals 9-15 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Protecting the Database • • RAID is essential with large databases RAID improves: – Reliability – Storage management • • There are different levels of RAID You can eliminate disk contention with disk striping Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-16 Data Warehousing Fundamentals Protecting the Database ..................................................................................................................................................... Protecting the Database You must consider using some form of protection against media failure, such as mirroring or RAID (Redundant Array of Independent Disks) technology, so that the data warehouse can be restored to its original state. This protection is valuable, even in a small database, because it can often save the need for recovery. The larger a database, the greater the necessity and the bigger the cost of this sort of technology. RAID RAID achieves data accessibility benefits in a cost effective manner: • Improved reliability (fault tolerance) • Enhanced storage management RAID Levels There are a number of different levels of RAID: • RAID Level 0: Striping without parity (DSA) • RAID Level 0+1: Mirrored striping • RAID Level 1: Mirrored disk array (MDA) • RAID Level 3: Data striping with byte level parity • RAID Level 4: Same as RAID 3, but with block level parity • RAID Level 5: Independent Disk Array (IDA) Note: RAID Levels 0, 1, and 5 are discussed on the following pages because these are found to be most useful. In a data warehouse where the workload profile is unknown, you should use machine striping for all objects. To eliminate contention for disks you should ensure that tables that are subject to multiple concurrent parallel scans are given a dedicated set of disks, striped to give the necessary I/O bandwidth and load balancing abilities. The stripe size is a hotly debated issue. It impacts tablescan performance as well as database operational issues, such as backups and restores. When setting the stripe size, the administrator should ensure that each I/O can be satisfied within one stripe. ..................................................................................................................................................... Data Warehousing Fundamentals 9-17 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... RAID 0: Striping Disk array controller File A (a) File A (e) File A (b) File A (c) File A (d) File A (f) The file is written to a four-drive disk array: • • • Block 1 on Drive 1 Block 2 on Drive 2 . . . Block 5 in another sector on Drive 1 Copyright  Oracle Corporation, 1999. All rights reserved. ® RAID 0: Striping • Benefits: – Good for simultaneous reads and writes – No redundancy – Scalable • Limitations: – Not recommended for mission-critical systems – No recovery from data loss – One bad sector affects entire disk of data Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-18 Data Warehousing Fundamentals Protecting the Database ..................................................................................................................................................... RAID Level 0: Striping RAID-0 spreads (stripes) the database across hardware volumes. Striping data spreads the I/O load across multiple disks, increasing throughput. There is a tradeoff between performance and resilience. The more disks there are, the more files end up on a single disk, and inevitably the more files are lost if there is a disk failure. This makes the use of mirroring or RAID technology all the more important. In the example, you see a file written to a four-drive disk array. Data is striped by system block size, in increments of one segment at a time (the segment size is a system-dependent feature). Independent data paths go to the drives, and the spreading of segment-length portions of data is repeated across the entire disk array. Benefits: • Is good for simultaneous reads and writes, which benefit applications that produce very large files • Gives rise to no disk redundancy as data is striped by system block size, in increments of one segment at a time • Provides a scalable solution Limitations: • Is not recommended for mission critical systems • Provides no recovery from data loss • Enables one bad sector to affect the entire disk of data ..................................................................................................................................................... Data Warehousing Fundamentals 9-19 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... RAID 1: Mirrored Disk Disk array controller Disk 1 Mirror Disk 1 File A (a) File A (b) File A (a) File A (b) Disk 2 Disk 2 Mirror File B (c) File B (d) File B (e) File B (c) File B (d) File B (e) Copy of files stored on mirror disk Copyright  Oracle Corporation, 1999. All rights reserved. ® RAID 1: Mirrored Disk • Benefits: – Complete data redundancy – No performance penalty – Improves reads – Scalability • Limitations: – Highest cost of all RAID configurations Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-20 Data Warehousing Fundamentals Protecting the Database ..................................................................................................................................................... RAID Level 1: Mirrored RAID Level 1 (or mirroring) provides the simplest level of redundancy. One primary disk is mirrored by another disk in the RAID set. The number of mirror disks is scalable, but the capacity of the RAID set is not. Mirroring doubles the size of your disk set. Higher levels of RAID require special equipment but reduce the number of extra disks needed. This enables you to get more data onto a system. This method gives the following benefits: • Complete data redundancy • No performance penalty; in fact, RAID-1 improves performance for reads • Scalable The limitation of this method is that it bears the highest cost of all RAID configurations. ..................................................................................................................................................... Data Warehousing Fundamentals 9-21 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... RAID 5: Independent Disk Array Disk array controller Disk 1 Disk 2 File C (a) File C (d) File (h) P (i,j) File C (b) File C (e) P (d,e,f,g,h) Disk 3 Disk 4 File C (c) File C (f) File C (i) P (a,b,c) File C (g) File C (j) Data striped with parity across array Copyright  Oracle Corporation, 1999. All rights reserved. ® RAID 5: Independent Disk Array • Benefits: – Efficient data integrity – Data reconstruction – Multiple concurrent seeks across array – Scalable • Limitations: – Disk overhead – Data write rate A warehouse typically uses RAID 0, 1, or 5 Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-22 Data Warehousing Fundamentals Protecting the Database ..................................................................................................................................................... RAID Level 5: Independent Disk Array (IDA) RAID-0 is designed to engage all disk drives in the array at the same time on the same read and write operation. However, RAID-5 is designed to engage as many drives as possible at the same time on different read and write operations. The stripe size is system-dependent, as with RAID-0. When the host sends a portion of data to be written to disk, the RAID controller breaks it up into smaller portions, according to the stripe size, and writes the portions to the disks in parallel. The parity information is interleaved throughout the disk array and is marked by a parity segment. This method gives the following benefits: • Efficient data integrity • Reconstruction of data from a failed disk (as long as it is not the parity disk) • Multiple concurrent seeks across disk array • Scalability The limitation of this method is the disk overhead and reduced data write rate. Note: A typical data warehouse employs RAID-0, RAID-1, or RAID-5. ..................................................................................................................................................... Data Warehousing Fundamentals 9-23 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Backup • • • Plan at the design stage Use hot backups for VLDBs Back up necessary components: – Fact and dimension data – Warehouse schema – Metadata schema – Metadata • Export/Import utility – Disk space – Time Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-24 Data Warehousing Fundamentals Protecting the Database ..................................................................................................................................................... Backup The backup and recovery strategy for a warehouse needs to be considered at the design stage. Details such as how the data is partitioned greatly affect the strategy. For small and medium databases, daily cold backups (taken while all instances of the database are shut down) and export/import are viable backup tools. However, once you move to very large databases (VLDBs), complete cold backups become difficult to fit into an overnight window. In addition, the disk space required for a complete export of a large database becomes an issue. You need to consider other strategies, such as using tape or other devices. The defined backup strategy for the warehouse should allow for hot backups, where you can back up any part of the database at any time of the day, while the database instances are still active. With Oracle, this means backing up individual and active tablespaces. You should back up every component that is essential to warehouse operations— everything required to restore a working environment: • Fact data • Dimension data • Data warehouse and metadata schema • Data warehouse metadata Export/Import The export/import utility enables an entire or part of a database to be extracted into a dump file and then imported into another database (under another owner if required). Generally, import/export of a VLDB uses too much disk space. You could use named pipes to a disk on a UNIX system to overcome space problems. However, this technique would be very time-consuming. ..................................................................................................................................................... Data Warehousing Fundamentals 9-25 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Summary This lesson discussed the following topics: • Explaining vertical partitioning and horizontal partitioning • Distinguishing the different types of partitioning methods • Distinguishing between B-tree index and bitmap index • Understanding why warehouse typically uses RAID 0, 1, or 5 to protect the database Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-26 Data Warehousing Fundamentals Summary ..................................................................................................................................................... Summary This lesson discussed the following topics: • Discussing vertical partitioning and horizontal partitioning • Distinguishing the different types of partitioning methods • Distinguishing between B-tree index and bitmap index • Understanding why warehouse typically uses RAID 0, 1, or 5 to protect the database ..................................................................................................................................................... Data Warehousing Fundamentals 9-27 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... Practice 9-1 Overview This practice covers the following topics: • • • Defining partitioning method Identifying indexing method Determining RAID levels and providing justification for each of the level Copyright  Oracle Corporation, 1999. All rights reserved. ® ..................................................................................................................................................... 9-28 Data Warehousing Fundamentals Practice 9-1 ..................................................................................................................................................... Practice 9-1 1 For the following description, state the type of partitioning method it best describes. The partitioning methods are range partitioning, hash partitioning, and composite partitioning. Description Places specific ranges of table entries on different disks. For example, records having “name” as a key may have names beginning with A-B in one partition, C-D in the next, and so on. Likewise, a DSS managing monthly operations might partition each month onto a different set of disks. Distributes DBMS data evenly across the set of disk spindles. This partitioning method is applied to one or more database keys, and the records are distributed across disk subsystems accordingly. The drawback of this partitioning method is the quantity of data may vary significantly from one partition to another and the frequency of data access may vary as well. For example, as the data accumulates, it may turn out that a larger number of customer names fall into the M-N range than the A-B range. This partition method is a combination of two partitioning methods. A table that is partitioned using this method is initially partitioned by range, and then subpartitioned using the hash method. Partitioning Method ..................................................................................................................................................... Data Warehousing Fundamentals 9-29 Lesson 9: Planning Warehouse Storage ..................................................................................................................................................... 2 For each of the following descriptions, state the type of indexing method it best describes. The indexing methods are B-tree, bitmap, and index-organized tables. Description Contains a hierarchy of highest-level and succeeding lowerlevel index blocks. The upper level blocks are called branch blocks, and they point to the lower-level blocks. The leaf blocks are the lower-level blocks and they contain the unique ROWID that points at the location of the actual row. This indexing method will benefit queries in which the WHERE clause contains multiple predicates on lowcardinality columns. Table Row ID 0001 0002 0003 0004 Each row has a bit for each key Male 1 0 0 1 Indexing Method Female 0 1 1 0 Each key value has a bit for each row. This method merges table data and index data into one structure. Thus, the data is the index and the index is the data. 3 Form into small groups, and consider each of the following questions. For each question, discuss in your groups and present your group’s answers to the class at the end of the discussion. a How does RAID-5 differ from RAID-1? b How do I decide between RAID-5 and RAID-1? c What variables can affect the performance of a RAID-5 device? d What types of files are suitable for placement on RAID-5 devices? 4 For each of the descriptions below, assign the RAID level, such as RAID Level 0, RAID Level 1, or RAID Level 5. Description This RAID level has the lowest cost and highest performance. This RAID level is low cost and has high availability. This RAID level has high performance and high availability. RAID Level ..................................................................................................................................................... 9-30 Data Warehousing Fundamentals
 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
									 
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                             
                                            