Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Warehouse and Business Intelligence Dr. Minder Chen [email protected] Fall 2009 BI Business Intelligence (BI) is the process of gathering meaningful information to answer questions and identify significant trends or patterns, giving key stakeholders the ability to make better business decisions. “The key in business is to know something that nobody else knows.” -- Aristotle Onassis PHOTO: HULTON-DEUTSCH COLL “To understand is to perceive patterns.” — Sir Isaiah Berlin "The manager asks how and when, the leader asks what and why." — “On Becoming a Leader” by Warren Bennis © Minder Chen, 2004-2010 Data Warehouse - 2 BI Questions • What happened? – What were our total sales this month? • What’s happening? – Are our sales going up or down, trend analysis • Why? – Why have sales gone down? • What will happen? – Forecasting & “What If” Analysis • What do I want to happen? – Planning & Targets Source: Bill Baker, Microsoft © Minder Chen, 2004-2010 Data Warehouse - 3 Business Intelligence Increasing potential to support business decisions (MIS) Making Decisions Data Presentation Visualization Techniques End User Business Analyst Data Analyst Data Mining Information Discovery Data Exploration OLAP, MDA, Statistical Analysis, Querying and Reporting Data Warehouses / Data Marts Data Sources (Paper, Files, Information Providers, Database Systems, OLTP) © Minder Chen, 2004-2010 DBA Data Warehouse - 4 Inmon's Definition of Data Warehouse – Data View • A warehouse is a – subject-oriented, – integrated, – time-variant and – non-volatile collection of data in support of management's decision making process. Source: http://www.intranetjournal.com/features/datawarehousing.html – Bill Inmon in 1990 © Minder Chen, 2004-2010 Data Warehouse - 5 Inmon's Definition Explain • Subject-oriented: They are organized around major subjects such as customer, supplier, product, and sales. Data warehouses focus on modeling and analysis to support planning and management decisions vs. operations and transaction processing. • Integrated: Data warehouses involve an integration of sources such as relational databases, flat files, and online transaction records. Processes such as data cleansing and data scrubbing achieve data consistency in naming conventions, encoding structures, and attribute measures. • Time-variant: Data contained in the warehouse provide information from an historical perspective. • Nonvolatile: Data contained in the warehouse are physically separate from data present in the operational environment. © Minder Chen, 2004-2010 Data Warehouse - 6 The Data Warehouse Process Source Systems 1 Extract, Transform, & Loading ETL Design the Data Warehouse © Minder Chen, 2004-2010 Data Marts and cubes Clients Data Warehouse 2 Populate Data Warehouse Query Tools Reporting Analysis Data Mining 3 Create OLAP Cubes 4 Query Data Data Warehouse - 7 OLTP Normalized Design Warehouse Ordering Process Chain Retailer Store Retailer Payments Retailer Returns Product POS Process Retail Promo Brand GL Account Retail Cust Cash Register © Minder Chen, 2004-2010 Clerk Data Warehouse - 8 OLTP Versus Business Intelligence: Who asks what? OLTP Questions • When did that order ship? • How many units are in inventory? • Does this customer have unpaid bills? • Are any of customer X’s line items on backorder? © Minder Chen, 2004-2010 Analysis Questions • What factors affect order processing time? • How did each product line (or product) contribute to profit last quarter? • Which products have the lowest Gross Margin? • What is the value of items on backorder, and is it trending up or down over time? Data Warehouse - 9 OLTP vs. OLAP Source: http://www.rainmakerworks.com/pdfdocs/OLTP_vs_OLAP.pdf#search=%22OLTP%20vs.%20OLAP%22 © Minder Chen, 2004-2010 Data Warehouse - 10 Dimensional Design Process Business Requirements • Select the business process to model • Declare the grain of the business process/data in the fact table • Choose the dimensions that apply to each fact table row • Identify the numeric facts that will populate each fact table row Data Realities © Minder Chen, 2004-2010 Data Warehouse - 11 Select a business process to model • Not business departments or business functions • Cross-functional business processes • Business events • Examples: – – – – – – Raw materials purchasing Order fulfillment process Shipments Invoicing Inventory General ledger © Minder Chen, 2004-2010 Data Warehouse - 12 Requirements © Minder Chen, 2004-2010 Data Warehouse - 13 Identifying Measures and Dimensions Performance Measures for KPI Measures Performance Drivers Dimensions The attribute varies continuously: The attribute is perceived as a constant or discrete value: •Balance •Unit Sold •Cost •Sales •Description •Location •Color •Size © Minder Chen, 2004-2010 Data Warehouse - 14 A Dimensional Model for a Grocery Store Sales © Minder Chen, 2004-2010 Data Warehouse - 15 Product Dimension • SKU: Stock Keeping Unit • Hierarchy: – Department Category Subcategory Brand Product © Minder Chen, 2004-2010 Data Warehouse - 16 Creating Dimensional Model • Identify fact tables • Translate business measures into fact tables • Analyze source system information for additional measures • Identify base and derived measures • Document additivity of measures • Identify dimension tables • Link fact tables to the dimension tables • Create views for users © Minder Chen, 2004-2010 Data Warehouse - 17 Transaction Level Order Item Fact Table Always has a date dimension © Minder Chen, 2004-2010 Data Warehouse - 18 Inside a Dimension Table • Dimension table key: Uniquely identify each row. Use surrogate key (integer). • Table is wide: A table may have many attributes (columns). • Textual attributes. Descriptive attributes in string format. No numerical values for calculation. • Attributes not directly related: E.g., product color and product package size. No transitive dependency. • Not normalized (star schemar). • Drilling down and rolling up along a dimension. • One or more hierarchy within a dimension. • Fewer number of records. © Minder Chen, 2004-2010 Data Warehouse - 19 Fact Tables Fact tables have the following characteristics: • Contain numeric measures (metric) of the business • May contain summarized (aggregated) data • May contain date-stamped data • Are typically additive • Have key value that is typically a concatenated key composed of the primary keys of the dimensions • Joined to dimension tables through foreign keys that reference primary keys in the dimension tables © Minder Chen, 2004-2010 Data Warehouse - 20