* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download What Is a Dimensional Data Warehouse?
Clusterpoint wikipedia , lookup
Expense and cost recovery system (ECRS) wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Database model wikipedia , lookup
Data vault modeling wikipedia , lookup
Program Pelatihan Tenaga Infromasi dan Informatika Sistem Informasi Kesehatan Paulraj Ponniah. 2010. Data Warehousing Fundamentals for IT Professional, John Wiley & Sons. Vincent Rainardi. 2008. Building a Data Warehouse With Examples in SQL Server. Apress. William H. Inmon. 2005. Building The Data Warehouse, Willey. 1980’s to early 1990’s Focus on computerizing business processes To gain competitive advantage By early 1990’s All companies had operational systems It no longer offered any advantage How to get competitive advantage?? Information A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research] Data Companies, over the years, gathered huge volumes of data “Hidden Treasure” Can this data be used in any way? Can we analyze this data to get any competitive advantage? Allows “efficient” analysis of data Competitive Advantage Analysis aids strategic decision making Increased productivity of decision makers Potential high ROI Quick decisions “The ultimate goal is simple: Give the battlefield commander access to all the information needed to win the war. And give it to him when he wants it, where he wants he and how he wants it.” -- Gen. Colin L. Powell, “Information Warriors,” BYTE, 1992 Retail Manufacturing Customer Loyalty Cost Reduction Market Planning Logistics Management Finacial Utilities Risk Management Asset Management Fraud Detection Resource Managament Airlines Government Route Profitability Manpower Planning Yield Management Cost Control Strategic Information needed to formulate: the business strategies, establish goals, set objectives, and monitor results. Examples of business objectives: Retain the present customer base Increase the customer base by 15% over the next 5 years Improve product quality levels in the top five product groups Gain market share by 10% in the next 3 years Enhance customer service level in shipments Bring three new products to market in 2 years Increase sales by 15% in the East Division INTEGRATED Must have a single, enterprise-wide view. DATA INTEGRITY Information must be accurate and must conform to business rules ACCESSIBLE Easily accessible with intuitive access paths, and responsive for analysis. CREDIBLE Every business factor must have one and only one value. TIMELY Information must be available within the stipulated time frame. Ease It combines information from different, separate systems in one location easy to access. Speed DW tables are specifically designed for quick response time, and handle large quantities of data. Report and other data are precalculated Reliability DW is read-only database stability over time. Flexibility Utilizing BI Tools Data warehousing is a simple concept It is born out of the need for strategic information and is the result of the search for a newway to provide such information. An Environment, Not a Product A Blend of Many Technologies A data warehouse is not a single software or hardware product you purchase to provide strategic information. A computing environment where users can find strategic information, an environment where users are put directly in touch with the data they need to make better decisions. It is a user-centric environment. Characteristics of new computing environment called the data warehouse: An ideal environment for data analysis and decision support Fluid, flexible, and interactive 100% user-driven Very responsive and conducive to the ask–answer–ask again pattern Provides the ability to discover answers to complex, unpredictable questions The basic concept of data warehousing is: Take all the data from the operational systems. Where necessary, include relevant data from outside, such as industry benchmark indicators. Integrate all the data from the various sources. Remove inconsistencies and transform the data. Store the data in formats suitable for easy access for decision making. A decision support database that is maintained separately from the organization’s operational databases. A data warehouse is a subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making “A collection of integrated, subjectoriented databases designed to supply the information required for decisionmaking.” -- W. Inmon (1992) “A data warehouse is a system that retrieves and consolidates data periodically from the source systems into a dimensional or normalized data store. It usually keeps years of history and is queried for business intelligence or other analytical activities. It is typically updated in batches, not every time a transaction happens in the source system.” -- Vincent Rainardi (2005) “A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.” Barry Devlin, IBM Consultant Relational Databases Optimized Loader ERP Systems Extraction Cleansing Data Warehouse Engine Purchased Data Legacy Data Metadata Repository Analyze Query The primary concept of data warehousing is that the data stored for business analysis can most effectively be accessed by separating it from the data in the operational systems. Fundamental differences between operational and informational (DW) environment: Nature of the data Development cycle Supporting technology User community Processing characteristics Subject-Oriented Data Integrated Data Time-Varying Data Nonvolatile Data Data Granularity Data Warehouse is designed around “subjects” rather than processes A company may have Retail Sales System Outlet Sales System Catalog Sales System DW will have a Sales Subject Area Heterogeneous Source Systems Little or no control Need to Integrate source data For Example: Product codes could be different in different systems Arrive at common code in DW Most business analysis has a time component Trend Analysis (historical data is required) In a data warehouse it is efficient to keep data summarized at different levels. Depending on the query, you can then go to the particular level of detail and satisfy the query. Data granularity in a data warehouse refers to the level of detail. The lower the level of detail, the finer is the data granularity. If we want to keep data in the lowest level of detail, we have to store a lot of data in the data warehouse. We will have to decide on the granularity levels based on the data types and the expected system performance for queries. Data granularity refers to the level of detail. Depending on the requirements, multiple levels of detail may be present. Many data warehouses have at least dual levels of granularity. Extract, Transform, Load (ETL) tools DW databases & DBMS tools Data marts Meta data DW administration & management tools Information delivery system Data Extraction Data Cleaning Data Transformation Convert from legacy/host format to warehouse format Load Sort, summarize, consolidate, compute views, check integrity, build indexes, partition Consumes 70-80% of project time Heterogeneous Source Systems Little or no control over source systems Source systems scattered Different currencies, measurement units Ensuring data quality A storage area where extracted data is cleaned, transformed and deduplicated. Initial storage for data Need not be based on Relational model Mainly sorting and Sequential processing Does not provide data access to users Analogy – kitchen of a restaurant Commercial tools: Warehouse Builders (Oracle) MS Data Transformation Services SSIS (Microsoft) DataStage SAS ETL Server Typical functions Define source, query (run SQL), define transformation, define target, verify transformation, schedule run, audit report Almost always a relational DB Oracle, DB2, Sybase, SQL Server New DB design for special purpose of DW (e.g., scale up, speed up, parallel processing) OLTP Systems are Data Capture Systems “DATA IN” systems DW are “DATA OUT” systems Design of the DW must directly reflect the way the managers look at the business Should capture the measurements of importance along with parameters by which these parameters are viewed must facilitate data analysis, i.e., answering business questions A logical design technique that seeks to eliminate data redundancy Illuminates the microscopic relationships among data elements Perfect for OLTP systems Responsible for success of transaction processing in Relational Databases ER models are NOT suitable for DW? End user cannot understand or remember an ER Model Many DWs have failed because of overly complex ER designs Not optimized for complex, ad-hoc queries Data retrieval becomes difficult due to normalization Browsing becomes difficult Most relational databases are set to 3rd normal form 1st Normal form – Tables have unique keys and no repeating groups or multi-value fields 2nd Normal form – Every attribute is dependent ont the entire key of the table 3rd Normal form – Attributes are dependent only on the key. No derived elements Business needs to analyze data so that it can: Understand trends Predict future behavior and needs Personalize contact with customers Be competitive All of this in a speedy manner, with the ability to do “What if’s” Data is not structured for analytical usage Multiple Joins are resource intensive Missing data from external sources, context history, not operational sources “A structured repository of validated and integrated historical information accessible to business people to provide the basis for both tactical and strategic business decisions.” Centralized extract and staging Separate from operational system Structured for analysis Historically contexted Relational Data External Data Enterprise Data Data Distribution Acquisition, Staging, Cleaning, Transformation Data Warehouse Storage Analytical Applications Detail Level Dimensional Normal form Value and feasibility Analytical Level Structured for the required analyses Summary Level Summaries for user requirements Better response time Normalized for maintainability De-normalized for performance, based on rules 2 level structure, therefore only one level of joins required for queries Subject Fact Dimension ▪ Aspect / Factor ▪ Level of reality ▪ Lifelike quality Facts are stored in FACT Tables Dimensions are stored in DIMENSION tables Dimension tables contains textual descriptors of business Fact and dimension tables form a Star Schema “BIG” fact table in center surrounded by “SMALL” dimension tables Measures or facts Facts are “numeric” & “additive” For example; Sale Amount, Sale Units Factors or dimensions Star Schemas Snowflake & Starflake Schemas Data mart = subset of DW for community users, e.g. accounting department Sometimes exist as Multidimensional Database Info mart = summarized data + report for community users Data about data Field description, business rules (e.g. profit=? formula), log of file updates Help users understand content & locate data Security & priority Keep track of updates QC Purging & copy to data mart Security issue critical (users at many levels) Some security measures to protect a DW Views = limit users to see certain rows/columns Access control = grant rights to specific users to access selected data (can be created by DBA thro’ SQL commands such as Grant/Revoke) Admin controls such as group access, firewall, encryption Audit = track what users are doing Tools Query & reporting OLAP Data mining, visualization, segmentation, clustering New developments: text mining, web mining & personalization Mining multimedia data Commercial tools Ms SQL Server Business Intelligence, Oracle Business Intelligence Suite, Crystal Report, Cognos Solution, WebFocus Increasingly common mode of delivery: Web-enabled Thank you