* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Data warehouse
Survey
Document related concepts
Transcript
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Introduction to Data Warehouse Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential Chair of Business 1 Outline • Why Data Warehouse? – Problems, causes and data warehouse solutions • What is Data Warehouse? – Characteristics and components • Current Practices of Data Warehouse 2 Why Data Warehouse? • Knowledge Management Problems (Drowning in data, starving for knowledge) 1. Can’t access data (easily) E.g., data from different branches, years, functional areas, etc. 2. Give me only what’s important (knowledge) E.g., Regions and products that have upward sales trends over the last five years. 3. I need to reduce data to what’s important by slicing and dicing. E.g., by branch, product, year, etc. 3 Why Data Warehouse? 4. Data inconsistency and poor data quality E.g., the 2001 PC sales amount in SLC from the CFO and the SLC Account Manager are not the same. 5. Need to improve the practices of making informed decisions. E.g., Did the VP for Marketing decide on the advertising budgets for branches in the SW region based on their sales performances over the last five years? 6. Hard and slow to query the database? E.g., VP for Marketing, CFO and Account Manager had to wait for the MIS Department to generate sales performance reports and analyses. 4 Why Data Warehouse? • ROI Problems 7. Can I get more value out of my data? Ans: Make informed, potent decisions using knowledge extracted from integrated and consistent data over a long period of time. 8. Can I do this cost-effectively? Options: federated (interoperable) databases vs. a data warehouse 9. Can I easily scale up or change how I get knowledge out of my data? E.g., Add more regions, functional areas or years in sales performance analyses. 5 Causes for the Problems Cause 1: Isolated databases distributed in an enterprise CRM Sales Inventory A Root cause for problems 1, 4, 5, 6, 7, 8 and 9 6 Why Data Warehouse • Cause 1: Isolated databases distributed in an enterprise Sales CRM Inventory Ad hoc access solutions cannot alleviate the problems 7 Why Data Warehouse • Cause 2: Historical data is archived in offline storage systems Sales Historical Sales Data Another Root cause for problems 1, 4, 5, 6, 7, 8 and 9 Archive 8 Why Data Warehouse • Cause 2: Historical data is archived in offline storage systems Sales Historical Sales Data Ad hoc accesses are slow and inconvenient Archive 9 Cause 3: Metadata for Transaction DB systems is Not User Friendly SSN Name SSN 1 Has M Dependent Rank Name Address Instructor 1 Student M Take M Course Name Sex Phone Relation Grade IS-A Undergraduate Major C-Name C-No Graduate Major Minor 10 11 Why Data Warehouse • Cause 4: Query and programming languages are even less user friendly – DESB students’ academic grades and GPAs since the freshman year – Sales amount distribution by product category, customer state and year – Slicing and dicing – SQL statements??? – Report/screen interface codes??? 12 Why Data Warehouse • Cause 5: Transaction databases are optimized (normalized) to process transactions but not to answer decision support queries – Bad query performance to join the normalized tables – Heavy transaction processing workload 13 What is Data Warehouse Designed to solve problems associated with current database practices: • Isolated, distributed databases Sales Inventory CRM Extract, replicate, integrate, cleanse & load Data Warehouse 14 Why Data Warehouse • Historical data is archived in offline storage systems Integrate Historical Data with Current Data Sales Historical Sales Data Archive Data Warehouse 15 What is Data Warehouse • Causes 3, 4 and 5: Hard-to-understand metadata, and query and programming languages; poor decision support query performances • Solution: In data warehouse, organize data in subject –oriented way rather than processoriented way – dimensional modeling. 16 Dimensional Modeling (Star Schema) Course Instructor . Name . Rank Academic Performance . Number . Title . Grade Student . Name . UG/PG . Major Semester . Year . Length . Start date 17 Dimensional Modeling (Star Schema) Customer Branch . Name . State . City Sales . Qty . Amt Product . Name . Category . Name . State . City Time . Year . Quarter . Month 18 One System for Multiple Uses Application Program Database Interactive Application Program Queries/ Transactions Database Management System (DBMS) Metadata Database System 19 Two Worlds -> Two Systems Executive Information System Operational Application Operational Application Operational Application OLTP DBs Operational Decision Support System (DSS) Data warehouse Reporting DSS 20 What is Data Warehouse • Data Warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision making process. • 1. Subject-oriented means the data warehouse focuses on the high-level entities of business such as sales, products, and customers. This is in contrast to database systems, which deals with processes such as placing an order. 21 What is Data Warehouse 2. Integrated means the data is integrated from distributed data sources and historical data sources and stored in a consistent format. 3. Time-variant means the data associates with a point in time (i.e., semester, fiscal year and pay period) 4. Non-volatile means the data doesn’t change once it gets into the warehouse. 22 Characteristics of Data Warehouse Purpose OLAP Data Warehouse OLTP DB Decision Support Transaction Processing Data Dimensional Normalized Model Relational Time Span Historical and Current Data Current Data Query Scan a Scan a small processing substantial set of data subset of data Operation Read-only Read & Update 23 Data Warehouse and Data Mart • Data warehouse – defined by its decision support purpose and other characteristics – Other characteristics: subject-oriented, integrated • Data mart – a data warehouse for a more limited business scope (e.g., a department, etc.) • A data warehouse may be built from several data marts 24 Basic Elements of a Data Warehouse System Source System (Legacy) Storage: Relational The Data Warehouse Presentation Servers Data Staging Area extract Flat files (fastest); RDBMS; Other Populate, replicate, recover Processing: Flat files Spreadsheets extract ERP Legacy extract Uploaded cleaned dimensions Clean; Prune; Combine; Remove duplicates; households; standardize; conform dimensions; store awaiting replications; archive; export to data marts Populate, replicate, recover No user query services Populate, replicate, recover Data Mart #1: OLAP ( ROLAP and/or MOLAP) query services; dimensional! Subject oriented; locally implemented; user group driven; may store atomic data; may be frequently refreshed; conform to DW Bus End User Data Access feed Ad Hoc Query Tools feed Report Writers Conformed feed dimensions and facts DW BUS Data Mart #2 DW BUS feed Conformed dimensions and facts Data Mart #3 Uploaded model results End User Applications Models: forecasting; scoring; allocating; data mining; other downstream Systems; other parameters; 25 special UI Current Practice of DW* • Expected DW market value in 2002 was projected to have grown to $113.5 billion. • Average DW development cost is $1.5 million and average maintenance cost is $0.5 million. • DW development time ranges from 1 to 3 yrs. 26 * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001 Current Practice of DW* • Sponsorship for the DW project Sponsor Percentage VP of a business unit 39.8 CIO 26.9 Business unit manager 16.7 CEO 11.1 Other 25.0 27 * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001 Current Practice of DW* • DW Benefits – – – – Less effort to produce better information Better decisions Improvement of business processes Support for accomplishments of strategic business objectives • Return on Investments and Cost of Ownership? 28 * Source: H.J. Watson, “ Current Practicing in Data Warehousing”, I.S. Management, 2001