* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 9780538469685_PPT_ch13 - MCST-CS
Survey
Document related concepts
Transcript
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 13 Business Intelligence and Data Warehouses Objectives In this chapter, you will learn: • How business intelligence is a comprehensive framework to support business decision making • How operational data and decision support data differ • What a data warehouse is, how to prepare data for one, and how to implement one • What star schemas are and how they are constructed Database Systems, 9th Edition 2 Objectives (cont’d.): • What data mining is and what role it plays in decision support • About online analytical processing (OLAP) • How SQL extensions are used to support OLAP-type data manipulations Database Systems, 9th Edition 3 The Need for Data Analysis • Managers track daily transactions to evaluate how the business is performing • Strategies should be developed to meet organizational goals using operational databases • Data analysis provides information about shortterm tactical evaluations and strategies Database Systems, 9th Edition 4 Business Intelligence • Comprehensive, cohesive, integrated tools and processes – Capture, collect, integrate, store, and analyze data – Generate information to support business decision making • Framework that allows a business to transform: – Data into information – Information into knowledge – Knowledge into wisdom Database Systems, 9th Edition 5 Business Intelligence Architecture • Composed of data, people, processes, technology, and management of components • Focuses on strategic and tactical use of information • Key performance indicators (KPI) – Measurements that assess company’s effectiveness or success in reaching goals • Multiple tools from different vendors can be integrated into a single BI framework Database Systems, 9th Edition 6 Database Systems, 9th Edition 7 Decision Support Data • BI effectiveness depends on quality of data gathered at operational level • Operational data seldom well-suited for decision support tasks • Need reformat data in order to be useful for business intelligence Database Systems, 9th Edition 8 Operational Data vs. Decision Support Data • Operational data – Mostly stored in relational database – Optimized to support transactions representing daily operations • Decision support data differs from operational data in three main areas: – Time span – Granularity – Dimensionality Database Systems, 9th Edition 9 Database Systems, 9th Edition 10 Decision Support Database Requirements • Specialized DBMS tailored to provide fast answers to complex queries • Four main requirements: – – – – Database schema Data extraction and loading End-user analytical interface Database size Database Systems, 9th Edition 11 Decision Support Database Requirements (cont’d.) • Database schema – Complex data representations – Aggregated and summarized data – Queries extract multidimensional time slices • Data extraction and filtering – Supports different data sources • Flat files • Hierarchical, network, and relational databases • Multiple vendors – Checking for inconsistent data Database Systems, 9th Edition 12 Decision Support Database Requirements (cont’d.) • End-user analytical interface – One of most critical DSS DBMS components – Permits user to navigate through data to simplify and accelerate decision-making process • Database size – In 2005, Wal-Mart had 260 terabytes of data in its data warehouses – DBMS must support very large databases (VLDBs) Database Systems, 9th Edition 13 The Data Warehouse • Integrated, subject-oriented, time-variant, and nonvolatile collection of data – Provides support for decision making • Usually a read-only database optimized for data analysis and query processing • Requires time, money, and considerable managerial effort to create Database Systems, 9th Edition 14 Database Systems, 9th Edition 15 The Data Warehouse (cont’d.) • Data mart – Small, single-subject data warehouse subset – More manageable data set than data warehouse – Provides decision support to small group of people – Typically lower cost and lower implementation time than data warehouse Database Systems, 9th Edition 16 Twelve Rules That Define a Data Warehouse • Data warehouse and operational environments are separated • Data warehouse data are integrated • Data warehouse contains historical data over long time • Data warehouse data are snapshot data captured at given point in time • Data warehouse data are subject-oriented Database Systems, 9th Edition 17 Twelve Rules That Define a Data Warehouse (cont’d.) • Data warehouse data are mainly read-only – Periodic batch updates from operational data – No online updates allowed • Data warehouse development life cycle differs from classical systems development • Data warehouse contains data with several levels of detail: – Current detail data, old detail data, lightly summarized data, and highly summarized data Database Systems, 9th Edition 18 Twelve Rules That Define a Data Warehouse (cont’d.) • Read-only transactions to very large data sets • Data warehouse environment traces data sources, transformations, and storage • Data warehouse’s metadata are critical component of this environment • Data warehouse contains chargeback mechanism for resource usage – Enforces optimal use of data by end users Database Systems, 9th Edition 19 Decision Support Architectural Styles • Provide advanced decision support features • Some capable of providing access to multidimensional data analysis • Complete data warehouse architecture supports: – Decision support data store – Data extraction and integration filter – Specialized presentation interface Database Systems, 9th Edition 20 Online Analytical Processing • Advanced data analysis environment that supports: – Decision making – Business modeling – Operations research • Four main characteristics: – – – – Use multidimensional data analysis techniques Provide advanced database support Provide easy-to-use end-user interfaces Support client/server architecture Database Systems, 9th Edition 21 Multidimensional Data Analysis Techniques • Data are processed and viewed as part of a multidimensional structure • Augmented by the following functions: – Advanced data presentation functions – Advanced data aggregation, consolidation, and classification functions – Advanced computational functions – Advanced data modeling functions Database Systems, 9th Edition 22 Database Systems, 9th Edition 23 Advanced Database Support • Advanced data access features include: – Access to many different kinds of DBMSs, flat files, and internal and external data sources – Access to aggregated data warehouse data – Advanced data navigation – Rapid and consistent query response times – Maps end-user requests to appropriate data source and to proper data access language – Support for very large databases Database Systems, 9th Edition 24 Easy-to-Use End-User Interface • Advanced OLAP features are more useful when access is simple • Many interface features are “borrowed” from previous generations of data analysis tools – Already familiar to end users – Makes OLAP easily accepted and readily used Database Systems, 9th Edition 25 Client/Server Architecture • Provides framework for design, development, and implementation of new systems – Enables OLAP system to be divided into several components that define its architecture – OLAP is designed to meet ease-of-use as well as system flexibility requirements Database Systems, 9th Edition 26 OLAP Architecture • Operational characteristics can be divided into three main modules: – Graphical user interface (GUI) – Analytical processing logic – Data-processing logic Database Systems, 9th Edition 27 OLAP Architecture (cont’d.) • Designed to use both operational and data warehouse data • In most implementations, data warehouse and OLAP are interrelated and complementary • OLAP systems merge data warehouse and data mart approaches Database Systems, 9th Edition 28 Database Systems, 9th Edition 29 Relational OLAP • Relational online analytical processing (ROLAP) provides the following extensions: – Multidimensional data schema support within the RDBMS – Data access language and query performance optimized for multidimensional data – Support for very large databases (VLDBs) Database Systems, 9th Edition 30 Database Systems, 9th Edition 31 Multidimensional OLAP • Multidimensional online analytical processing (MOLAP) extends OLAP functionality to multidimensional database management systems (MDBMSs) – MDBMS end users visualize stored data as a 3D data cube – Data cubes can grow to n dimensions, becoming hypercubes – To speed access, data cubes are held in memory in a cube cache Database Systems, 9th Edition 32 Database Systems, 9th Edition 33 Relational vs. Multidimensional OLAP • Selection of one or the other depends on evaluator’s vantage point • Proper evaluation must include supported hardware, compatibility with DBMS, etc. • ROLAP and MOLAP vendors working toward integration within unified framework • Relational databases use star schema design to handle multidimensional data Database Systems, 9th Edition 34 Star Schema • Data-modeling technique – Maps multidimensional decision support data into relational database • Creates near equivalent of multidimensional database schema from relational data • Easily implemented model for multidimensional data analysis while preserving relational structures • Four components: facts, dimensions, attributes, and attribute hierarchies Database Systems, 9th Edition 35 Facts • Numeric measurements that represent specific business aspect or activity – Normally stored in fact table that is center of star schema • Fact table contains facts linked through their dimensions • Metrics are facts computed at run time Database Systems, 9th Edition 36 Dimensions • Qualifying characteristics provide additional perspectives to a given fact • Decision support data almost always viewed in relation to other data • Study facts via dimensions • Dimensions stored in dimension tables Database Systems, 9th Edition 37 Attributes • Use to search, filter, and classify facts • Dimensions provide descriptions of facts through their attributes • No mathematical limit to the number of dimensions • Slice and dice: focus on slices of the data cube for more detailed analysis Database Systems, 9th Edition 38 Attribute Hierarchies • Provide top-down data organization • Two purposes: – Aggregation – Drill-down/roll-up data analysis • Determine how the data are extracted and represented • Stored in the DBMS’s data dictionary • Used by OLAP tool to access warehouse properly Database Systems, 9th Edition 39 Star Schema Representation • Facts and dimensions represented in physical tables in data warehouse database • Many fact rows related to each dimension row – Primary key of fact table is a composite primary key – Fact table primary key formed by combining foreign keys pointing to dimension tables • Dimension tables are smaller than fact tables • Each dimension record is related to thousands of fact records Database Systems, 9th Edition 40 Performance-Improving Techniques for the Star Schema • Four techniques to optimize data warehouse design: – Normalizing dimensional tables – Maintaining multiple fact tables to represent different aggregation levels – Denormalizing fact tables – Partitioning and replicating tables Database Systems, 9th Edition 41 Performance-Improving Techniques for the Star Schema (cont’d.) • Dimension tables normalized to: – Achieve semantic simplicity – Facilitate end-user navigation through the dimensions • Denormalizing fact tables improves data access performance and saves data storage space • Partitioning splits table into subsets of rows or columns • Replication makes copy of table and places it in different location Database Systems, 9th Edition 42 Implementing a Data Warehouse • Numerous constraints, including: – Available funding – Management’s view of role played by an IS department • Extent and depth of information requirements – Corporate culture • No single formula can describe perfect data warehouse development Database Systems, 9th Edition 43 The Data Warehouse as an Active Decision Support Framework • Data warehouse: – Is not a static database – Is a dynamic framework for decision support that is always a work in progress • Data warehouse is critical component of modern BI environment • Design and implementation must be examined as part of entire infrastructure Database Systems, 9th Edition 44 A Company-Wide Effort That Requires User Involvement • Data warehouse data cross departmental lines and geographical boundaries • Building a data warehouse requires the designer to: – – – – – Involve end users in process Secure end users’ commitment from beginning Create continuous end-user feedback Manage end-user expectations Establish procedures for conflict resolution Database Systems, 9th Edition 45 Satisfy the Trilogy: Data, Analysis, and Users • Data warehouse designer must satisfy: – Data integration and loading criteria – Data analysis capabilities with acceptable query performance – End-user data analysis needs Database Systems, 9th Edition 46 Apply Database Design Procedures • Company-wide effort requiring many resources • Quantity of data requires latest hardware and software • Detailed procedures to orchestrate flow of data from operational databases to data warehouse • People with advanced database design, software integration, and management skills Database Systems, 9th Edition 47 Database Systems, 9th Edition 48 Data Mining • Data-mining tools do the following: – Analyze data – Uncover problems or opportunities hidden in data relationships – Form computer models based on their findings – Use models to predict business behavior • Requires minimal end-user intervention Database Systems, 9th Edition 49 SQL Extensions for OLAP • Proliferation of OLAP tools fostered development of SQL extensions • Many innovations have become part of standard SQL • All SQL commands will work in data warehouse as expected • Most queries include many data groupings and aggregations over multiple columns Database Systems, 9th Edition 50 The ROLLUP Extension • Used with GROUP BY clause to generate aggregates by different dimensions • GROUP BY generates only one aggregate for each new value combination of attributes • ROLLUP extension enables subtotal for each column listed except for the last one – Last column gets grand total • Order of column list important Database Systems, 9th Edition 51 The CUBE Extension • CUBE extension used with GROUP BY clause to generate aggregates by listed columns – Includes the last column • Enables subtotal for each column in addition to grand total for last column • Useful when you want to compute all possible subtotals within groupings • Cross-tabulations are good candidates for application of CUBE extension Database Systems, 9th Edition 52 Materialized Views • A dynamic table that contains SQL query command to generate rows – Also contains the actual rows • Created the first time query is run and summary rows are stored in table • Automatically updated when base tables are updated Database Systems, 9th Edition 53 Summary • Business intelligence generates information used to support decision making • BI covers a range of technologies, applications, and functionalities • Decision support systems were the precursor of current generation BI systems • Operational data not suited for decision support Database Systems, 9th Edition 54 Summary (cont’d.) • Four categories of requirements for decision support DBMS: – – – – Database schema Data extraction and loading End-user analytical interface Database size requirements • Data warehouse provides support for decision making – Usually read-only – Optimized for data analysis, query processing Database Systems, 9th Edition 55 Summary (cont’d.) • OLAP systems have four main characteristics: – – – – Use of multidimensional data analysis Advanced database support Easy-to-use end-user interfaces Client/server architecture • ROLAP provides OLAP functionality with relational databases • MOLAP provides OLAP functionality with MDBMSs Database Systems, 9th Edition 56 Summary (cont’d.) • Star schema is a data-modeling technique – Maps multidimensional decision support data into a relational database • Star schema has four components: – – – – Facts Dimensions Attributes Attribute hierarchies Database Systems, 9th Edition 57 Summary (cont’d.) • Four techniques optimize data warehouse design: – – – – Normalize dimensional tables Maintain multiple fact tables Denormalize fact tables Partition and replicate tables • Data mining automates analysis of operational data • SQL extensions support OLAP-type processing and data generation Database Systems, 9th Edition 58