* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download DATA WAREHOUSING
Survey
Document related concepts
Transcript
DATA WAREHOUSING Legacy System Systems that were developed in the early years of business processing Rich source of historical data, but it’s difficult to retrieve, because of nonstandard features This is why we need data warehouse Problems with Legacy System Access data from a legacy system may be difficult for several reasons: Developed for a different hardware or software platform Use a different data model Use a different DBMS Use a different data definitions Use a different data format All these make difficulty in integration and sharing data Data Definitions Problems Homonyms – use different field names to store the same data in the different database Synonyms - use the same field names to store different data in the different database Domain integrity – domain for the same field may be different Business rules – may be different in different database Referential integrity – may be problems linking related records from different databases Concurrency control – when multiple users access a database that design for single user Data Warehouse Concepts Technique of extracting and filtering data from diverse database and use this data to build a new database Stores information extracted from historical, operational and external databases The primary purpose : to provide information for management decision making Database vs data warehouse Activity Database Data warehosue Function Support business operation Support decision making Data Process oriented Subject-oriented Usage Structured, repetitive Unstructured, repetitive Processing Data entry End user initiated queries Data Warehouse Architecture Operational database / external database layer Information access layer Data access layer Metadata layer Process management layer Application messaging layer Physical layer Data staging layer Data Warehouse Implementation Data – includes operational, historical and external data Extraction and transformation – extract and transform data in different table Data warehouse storage – store the extracted and transformed data in different table Historical data – used for forecasting purposes Reports, statistics, data analysis and presentation – output from data warehouse to make a decision Data Warehouse : Benefits and Risks Benefits : Reduces reporting cost Reduces data consolidation and integration cost Increase efficiency and decision making capabilities Risks House the wrong data Expensive to build and maintain Require organizational changes Online Analytical Processing Support data modeling and multidimensional data analysis Share the characteristics : Provide user-friendly interface Use multidimensional data analysis technique Provide advanced database support Support client/server architecture Online Analytical Processing Can be classified : Relational Online Analytical Processing – use RDBMS Multidimensional Online Analytical Processing – extension of RDBMS Data Mining Data mining is a decision support tools that enables a user to access directly large amount of data and analyzes the data Data mining is the set of activities used to find new, hidden, or unexpected patterns in data Data Mining Technique Data mining process has four phases : Data preparation – main data sets to be used are identified and cleaned Data analysis and classification – identify common data characteristic or pattern Knowledge acquisition – develop a model resemble target data Prediction – used to predict future behaviour and forecast business outcomes Data Mining Tools Data mining tools today has this following characteristics : Data preparation facilities Selection of data mining operations Product scalability and performance Facilities for visualization of results END