* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download “INTRODUCTION TO DATA WAREHOUSING”
Survey
Document related concepts
Transcript
INTRODUCTION TO DATA WAREHOUSING 1.0 INTRODUCTION A process of transforming information and making it available to users in a timely enough manner to make a difference 1.1 DATA WAREHOUSE 1.4 WARE HOUSING APPROACH “A DW is a subject-oriented, integrated, time-varying, non-volatile collection of data that is used primarily in organizational decision making.” A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. 1.2 DATA WAREHOUSING Inefficient and potentially expensive for frequent queries Competes with local processing at sources Hasn’t caught on in industry Clients Data Warehouse MARKET 1. In 1996, close to 90% of IT professionals had either created a data warehouse or were planning to create one 2. Average 3 year ROI of 400% 3. Average payback was 2.3 years on costs averaging $2.2 million. Information integrated in advance Stored in wh for direct querying and analysis Integration System Metadata ... Extractor/ Monitor Extractor/ Monitor Source Source Extractor/ Monitor ... Source 1.5 ADVANTAGE OF WAREHOUSING APPROACH 1.3 DISADVANTAGES OF TRADITIONAL QUERY-DRIVEN APPROACH Delay in query processing Slow or unavailable information sources Complex filtering and integration High query performance But not necessarily most current information Doesn’t interfere with local processing at sources Complex queries at warehouse OLTP at information sources Information copied at warehouse Can modify, annotate, summarize, restructure, etc. Can store historical information Security, no auditing Has caught on in industry . 2.0 DATA WAREHOUSE ARCHITECTURE Two-layer:It is a Real-time and most commonly used approach with derived data in present industry. Operational systems Informational systems Derived Data Real-time data Three-layer Architecture: Transformation of real-time data to derived data really requires two steps Operational systems Derived Data 2.1 DATA WAREHOUSE ARCHITECTURES: CONCEPTUAL VIEW Reconciled Data Informational systems View level “Particular informational needs” Physical Implementation of the Data Warehouse Real-time data Single-layer :Every data element is stored once only with Virtual warehouse Operational systems Informational systems 2.2 THE KEY CHARACTERISTICS OF A DATA WAREHOUSE The key characteristics of a data warehouse are as follows: “Real-time data” Some data is denormalized for simplification and to improve performance. Large amounts of historical data are used. Queries often retrieve large amounts of data. Both planned and ad hoc queries are common. The data load is controlled 2.2 COMMON ORACLE DATA WAREHOUSING TASKS In general, high data throughput is the key to a successful data warehouse. As an Oracle data warehousing administrator or designer, you can expect to be involved in the following tasks: Configuring an Oracle database for use as a data warehouse Designing data warehouses Performing upgrades of the database and software to new release levels Managing schema objects, such as tables, indexes, and materialized views Managing users and security Developing routines used for the Extraction, Transformation, and Loading (ETL) process Creating reports based on the data in the data warehouse Backing up the data warehouse and performing recovery when necessary Monitoring the data warehouse's performance and taking preventive or corrective action as required 2.3 TOOLS FOR ADMINISTERING THE DATA WAREHOUSE The intent of this guide is to enable you to quickly and efficiently create and administer an Oracle data warehouse. The following are some of the products, tools, and utilities you can use to achieve your goals with your data warehouse 2.3.1 ORACLE UNIVERSAL INSTALLER Oracle Universal Installer installs your Oracle software and options. It can automatically start the Database Configuration Assistant (DBCA) to install a database. 2.3.2 ORACLE ENTERPRISE MANAGER The primary tool for managing your database is Oracle Enterprise Manager, a Web-based interface. After you have installed the Oracle software, created or upgraded a database, and configured the network, you can use Oracle Enterprise Manager for managing your database. In addition, Oracle Enterprise Manager also provides an interface for performance advisors and for Oracle utilities such as SQL*Loader and Recovery Manager. 2.3.3 ORACLE WAREHOUSE BUILDER The primary product for populating and maintaining a data warehouse, Oracle Warehouse Builder provides ETL, data quality management, and metadata management functionality in a single product.Warehouse Builder includes a unified repository hosted on an Oracle Database. Warehouse Builder leverages Oracle Database functionality to generate code optimized for loading into and maintaining Oracle Database targets 2.3.4 DATABASE TUNING PACK Oracle Database Tuning Pack offers a set of new technologies that automate the entire database tuning process, which significantly lowers database management costs and enhances performance and reliability. The key features of Oracle Database Tuning Pack that will be used in this guide are the SQL Access and SQL Tuning Advisors. 2.4 PROCESS FLOW WITHIN A DATA WAREHOUSE The processes that represent data flow within a data warehouse are : Extract and load the data. Clean and transform the data. Back up and archive data. Managing queries and direct them to the appropriate data sources. 2.4.1 EXTRACT AND LOAD PROCESS Data extraction takes data from source systems and makes it available to the data warehouse: data load takes extracted data and loads it into the data warehouse. Data in operational systems is held in a form suitable for that system. When we extract data from a physical database , the original information content will have been modified and extended over the years, in order to support the data requirements of the operational system. Before loading the data into the data warehouse this information content must be re constructed. Once the data is extracted from the source systems , it is typically loaded into a temporary data store in order for it to be cleaned up and made consistent . 2.4.2 CLEAN AND TRANSFORM THE DATA This is the system process that takes the loaded data and structures it for query performance and for minimizing operational costs. There are a small number of steps with in a process : 1. clean and transform the loaded data into a structure that speed up queries. Make sure the data consistent with in itself. When you take a row of data and examine it, the contents of the row must make sense. Errors at this point are to do with errors in the source systems. Typicall checks are for nonsensical phone numbers , addresses , counts and so on. Make sure that data is consistent with other data with in the same source. Make sure data is consistent with data in the other source systems. Make sure data is consistent with the information already in the warehouse. 2. Partition the data in order to speed up queries, optimize hardware performance and simplify the management of the data warehouse. 3. create aggregations to speed up the common queries . 2.4.3 BACK UP AND ARCHIVE PROCESS As in operational systems, the data with in the data warehouse is backed up regularly in order to ensure that the data warehouse can always be recovered from data loss, software failure or hardware failures. In archiving, older data is removed from the system in a format that allows it to be quickly restore if required. For example, in a retail sales analysis data warehouse there may be a requirement to keep data for three years, with the latest six months being kept online. In this sort of process there is often a requirement to be able to do month-on-month comparisons for this year and last year. This will require some months of data to be temporarily restored from archive. 2.4.4 MANAGING QUERIES The query management process is the system process that manages the queries and speeds up by directing queries to the most effective data source. This process must also ensure that all the system resources are used in the most effective way, usually by scheduling the execution of queries. The query management process may also be required to monitor the actual query profiles. Unlike other system processes, query management process does not operate during the regular load of information into the data warehouse. This process operates at all times that the data warehouse is made available to end users. 3.0 DECISION SUPPORT SYSTEM An application that issues queries to the read-only database is called a decision support system(DSS). Used to manage and control business - Data is historical or point-in-time - Optimized for inquiry rather than update - Use of the system is loosely defined and can be ad-hoc - Used by managers and end-users to Understand the business and make judgments An application that updates is called an on-line transaction processing (OLTP) application. 3.1 DATA WAREHOUSE FOR DECISION SUPPORT Putting Information technology to help the Knowledge worker make faster and better Decisions Which of my customers are most likely to go to the competition? What product promotions have the biggest impact on revenue? How did the share price of software Companies correlate with profits over last 10 years. 3.2 SECURITY IN DATA WAREHOUSE Building a data warehouse does increase security risk because key, corporate information is all in one place To mitigate that risk, database system Components can be used to protect the data warehouse. These include – Views – Access control – – – Security Administration Encryption Audit – Views – Allow users to only see certain rows or columns of data – Access control – Indicate which users have access to what data – Administration – This component is used to actually give access to groups of users and to define the accesses given to either an individual or a group. – Encryption – Protect data from access outside of the DBMS – Audit – Track what users are doing