Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Sachin Goel (68) Manav Mudgal (69) Piyush Samsukha (76) Rachit Singhal (82) Richa Somvanshi (85) Sahar( ) Outline Data Warehousing Warehouse Architecture Its components Data flows Data marts Benefits of data warehousing Disadvantages of datawarehousing Case Study data What is data warehousing? warehousing is subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process. a data warehouse is data management and data analysis data webhouse is a distributed data warehouse that is implement over the web with no central data repository goal: is to integrate enterprise wide corporate data into a single reository from which users can easily run queries What is data warehousing? Subject-orientedWH is organized around the major subjects of the enterprise..rather than the major application areas.. This is reflected in the need to store decision-support data rather than application-oriented data Integratedbecause the source data come together from different enterprisewide applications systems. The source data is often inconsistent using..The integrated data source must be made consistent to present a unified view of the data to the users Time-variantthe source data in the WH is only accurate and valid at some point in time or over some time interval. The time-variance of the data warehouse is also shown in the extended time that the data is held, the implicit or explicit association of time with all data, and the fact that the data represents a series of snapshots Non-volatiledata is not update in real time but is refresh from OS on a regular basis. New data is always added as a supplement to DB, rather than replacement. The DB continually absorbs this new data, incrementally integrating it with previous data The architecture Reporting, query, application development, and EIS(executive information system) tools Operational data source1 High summarized data Meta-data Query Manager Operational data source 2 Lightly summarized data Load Manager Operational data source n Operational data store (ods) DBMS Detailed data OLAP(online analytical processing) tools Warehouse Manager Operational data store (ODS) Data mining Archive/backup data End-user access tools Typical architecture of a data warehouse The main components Operational data sourcesThe sources of data for the data warehouse is supplied from: The data from the mainframe systems in the traditional network and hierarchical format. Data can also come from the relational DBMS like Oracle, Informix. In addition to these internal data, operational data also includes external data obtained from commercial databases and databases associated with supplier and customers. Operational datastore(ODS)is a repository of current and integrated operational data used for analysis. It is often structured and supplied with data in the same way as the data warehouse, but may in fact simply act as a staging area for data to be moved into the warehouse The main components Load manageralso called the frontend component, it performs all the operations associated with the extraction and loading of data into the warehouse. These operations include simple transformations of the data to prepare the data for entry into the warehouse Warehouse managerperforms all the operations associated with the management of the data in the warehouse. The operations performed by warehouse manager include: Analysis of data to ensure consistency Transformation and merging the source data from temporary storage into data warehouse tables Create indexes and views on the base table. Generation of aggregation Backing up and archiving of data The main components Query manageralso called backend component, it performs all the operations associated with the management of user queries. The operations performed by this component include directing queries to the appropriate tables and scheduling the execution of queries Detailed, lightly and lightly summarized data,archive/backup data Meta-data End-user access toolscan be categorized into five main groups: data reporting and query tools, application development tools, executive information system (EIS) tools, online analytical processing (OLAP) tools, and data mining tools Data flows Inflow- The processes associated with the extraction, cleansing, and loading of the data from the source systems into the data warehouse. upflow- The process associated with adding value to the data in the warehouse through summarizing, packaging , and distribution of the data downflow- The processes associated with archiving and backing-up of data in the warehouse outflow- The process associated with making the data availabe to the end-users Meta-flow- The processes associated with the management of the meta-data Reporting, query,application development, and EIS (executive information system) tools Warehouse Manager Operational data source1 Meta-flow High summarized data Meta-data Inflow Outflow Lightly summarized data Load Manager Operational data source n Operational data store (ods) Upflow Query Manager OLAP (online analytical processing) tools DBMS Detailed data Warehouse Manager Data mining tools Downflow Archive/backup data Information flows of a data warehouse End-user access tools Data mart data mart a subset of a data warehouse that supports the requirements of particular department or business function The characteristics that differentiate data marts and data warehouses include: a data mart focuses on only the requirements of users associated with one department or business function. data marts do not normally contain detailed operational data, unlike data warehouses as data marts contain less data compared with data warehouses, data marts are more easily understood and navigated. Warehouse Manager Operational data source1 Operational data source 2 Lightly summarized data Load Manager Operational data source n Operational data store (ods) Reporting, query,application development, and EIS(executive information system) tools High summarized data Meta-data Query Manage Detailed data DBMS OLAP(online analytical processing) tools Warehouse Manager Data mining (First Tier) (Third Tier) Operational data store (ODS) End-user access tools Archive/backup data Data Mart summarized data(Relational database) Summarized data (Multi-dimension database) Typical data warehouse adn data mart architecture (Second Tier) Reasons for creating a data mart To give users access to the data they need to analyze most often To provide data in a form that matches the collective view of the data by a group of users in a department or business function To improve end-user response time due to the reduction in the volume of data to be accessed To provide appropriately structured data as ditated by the requirements of end-user access tools Normally use less data so tasks such as data cleansing, loading, transformation, and integration are far easier, and hence implementing and setting up a data mart is simpler than establishing a corporate data warehouse The cost of implementing data marts is normally less than that required to establish a data warehouse The potential users of a data mart are more clearly defined and can be more easily targeted to obtain support for a data mart project rather than a corporate data warehouse project The benefits of data warehousing The potential benefits of data warehousing are high returns on investment. substantial competitive advantage. increased productivity of corporate decision-makers. Data warehouses facilitate decision support system applications such as trend reports (e.g., the items with the most sales in a particular area within the last two years), exception reports, and reports that show actual performance versus goals. Disadvantages of warehousing Data warehouses are not the optimal environment for unstructured data. Because data must be extracted, transformed and loaded into the warehouse, there is an element of latency in data warehouse data. Over their life, data warehouses can have high costs. Maintenance costs are high. Data warehouses can get outdated relatively quickly. There is a cost of delivering suboptimal information to the organization. There is often a fine line between data warehouses and operational systems. Duplicate, expensive functionality may be developed. Or, functionality may be developed in the data warehouse that, in retrospect, should have been developed in the operational systems and vice versa. TOSHIBA Case study