Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Operational transformation wikipedia , lookup
Data Protection Act, 2012 wikipedia , lookup
Data center wikipedia , lookup
Forecasting wikipedia , lookup
Data analysis wikipedia , lookup
Data vault modeling wikipedia , lookup
3D optical data storage wikipedia , lookup
Basic Concepts of Datawarehousing An Overview Prasanth Gurram How to answer these Business Queries? What is the sales distribution region wise? How did my revenue improve in the past 5 years? What are the slow movers in my product line? Which of my Sales Agents are doing better? Which channel costs me more and pays less? Strategic Planning / Budgeting What is Defaulter’s Profile? Who are my profitable customers? Currency Risk, Interest Rate Risk, Liquidity Risk DSS Decision Support Systems (DSS) are interactive computerbased systems intended to help decision makers utilize data and models to identify and solve problems and make decisions. Data Warehouse is the foundation of DSS process. It is a Strategy and a Process for Staging Corporate Data. Enable users to get a “Business View” of the data Facilitate Data based Decision Making that would drive and improve the Business Discover “Hidden Trends” Driving Forces for DSS Business Speed Reform Customers RESULT: Technology Scenario without DSS Unavailability of Tools and Techniques for acquisition of data from various sources for answering business questions and making decisions, in earlier days Intensive efforts in data formatting than data analysis Static and inflexible report generation Time-lag in accessing the information at central place OLTP v/s DSS Environment OLTP Environment DSS Environment get data IN large volumes of simple transaction queries continuous data changes low processing time mode of processing transaction details data inconsistency mostly current data get information OUT small number of diverse queries periodic updates only high processing time mode of discovery subject oriented summaries data consistency historical data is relevant OLTP v/s DSS Environment OLTP Environment DSS Environment high concurrent usage highly normalized data structure static applications automates routines low concurrent usage fewer tables, but more columns per table dynamic applications facilitates creativity Benefits for Business User Flexible Information Access High Availability Ease of Use Quality & Completeness of Data Focus on Information Processing Information Base for Knowledge Discovery Available line of technology Advances in dbms technology Data warehousing On-line analytical processing Data mining Datawarehouse Data warehouses store large volumes of data which are frequently used by DSS.It is maintained separately from the organization’s operational databases Data warehouse is subject-oriented, integrated, timevariant, and nonvolatile collection of data Subject-oriented : Contains information regarding objects of interest for decision support: Sales by region, by product, etc. Itegrated: Data are typically extracted from multiple, heterogeneous data sources (e.g., from sales, inventory, billing DBs etc.). Time-variant: Contain historical data, longer horizon than operational system. Nonvolatile : Data is not (or rarely) directly updated. Datawarehouse Is the enabling technology that facilitates improved business decision-making It’s a process, not a product A technique for assembling and managing a wide variety of data from multiple operational systems for decision support and analytical processing It’s a journey not a destination... DW Components Extraction FS1 FS2 . . . FSn Transmission N E T W O R K Legacy System Data Mart Population Cleansing S T A G I N G Transformation Aggregation Summarization DM1 ODS DW DM2 DMn A R E A OLAP ANALYSIS Knowledge Discovery Metadata Layer Operational Process Data extraction Data Cleansing and Transformation Data Load and refresh Build derived data and views Service queries Administer the warehouse Extraction Process ( Data Capturing ) Business Transactions Feed System Application Data Capturing Process Incremental Data Control Metadata •Extract the incremental data from feed system •Store the extracted data into a temporary area •Extract data from multiple, heterogeneous, and external sources Extraction Process (Data Transmission ) Feed System Side Incremental Data Network Cloud FTP Staging area Incremental Data •Transmit the extracted data from Feed system to Staging area • Periodicity of transmission ( daily / weekly ) depends upon the feed system Cleansing Process Process Metadata Cleansing Rules Raw data (Staging Area) Cleansing Process Clean data Good Control Metadata Bad •Detect errors in the data and rectify them when possible •Mark it Good/Bad •Generate the cleansing Reports and mail to the DWA and Feed System representatives Cleansing Reports Transformation Process Process Metadata •Mapping Detail •Transformation Rule Clean Operational Data Operational Data Store Transformation Process Control Metadata •Transform the cleaned Operational Data into DSS Data •Load the DSS data into ODS •ODS contains the current DSS data at the lowest level of granularity Summarization Process DW ODS Summarization Process Weekly Monthly Control Metadata • Summarize and aggregate ODS data and Populate to the Warehouse • Periodicity of Summarization Process depends upon the level of summarization at Warehouse ( weekly, monthly, daily ) Yearly Metadata Data about Data Used to maintain Datawarehouse Control data Static: Roles, permissions, naming standards, source system names, Locations, target names, transformation and mapping rules Dynamic: Scheduling, scripts, load statistics, space usage, Backup statistics Business data Business rules,Who validates data,Who controls,How they validate DW Components/Tools Extraction/transformation/load tool (family of tools including data modeling tool, extraction tool, Meta data repository, and DW administration tools) Meta data exchange architecture (API used to integrate all components of DW with central Meta data) Target databases (relational, multidimensional, hybrid) Data access and analysis tools for end users Database servers, operating systems, networks DW Tools Tool Category ETL Tools Products ETI Extract, Informatica, IBM Visual Warehouse Oracle Warehouse Builder OLAP Server Oracle Express Server, Hyperion Essbase, IBM DB2 OLAP Server, Microsoft SQL Server OLAP Services, Seagate HOLOS, SAS/MDDB Oracle Express Suite, Business Objects, Web Intelligence, SAS, Cognos Powerplay/Impromtu, KALIDO, MicroStrategy, Brio Query, MetaCube,Informatica Oracle, Informix, Teradata, DB2/UDB, Sybase, Microsoft SQL Server, RedBricks SAS Enterprise Miner, IBM Intelligent Miner, SPSS/Clementine OLAP Tools Data Warehouse Data Mining & Analysis