Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Designing the Data Warehouse and Data Mart Methodologies and Techniques Data Warehouse • A data warehouse is the main repository of an organization's historical data, its corporate memory. It contains the raw material for management's decision support system. • The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems. (http://www.sqlpower.ca) • Bill Inmon, an early and influential practitioner, has formally defined a data warehouse in the following terms: “Building the Data Warehouse” - by W. H. Inmon Data Warehouse • Subject-oriented – The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together; • Time-variant – The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time; • Non-volatile – Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting; and • Integrated – The database contains data from most or all of an organization's operational applications, and that this data is made consistent. Basic Principles ETL: Extracting, Transforming (or Transporting) and Loading. Life Cycle of the DW First time load Operational Databases Warehouse Database Refresh Refresh Purge or Archive Refresh Data Transfers into a Database • First time system implementation – From a manual system • • • • Data warehousing projects Database version upgrade ERP projects Migration – From old to new system Data Transfers between Systems • Dynamic data (eg. sales orders) – Interface required? • Static data (eg. customers) – Conversion required? What Can go Wrong • Data not available – feature activated from implementation onwards – Massive data entry – Eg: different account structure • • • • • Data incomplete Data inconsistent (eg: engineering vs accounts) Wrong level of granularity Data not clean New system requires changes – new product codes Data Cleaning must Address • Different department record same info under different codes • Multiple records of same company (under different names) • Fields missing in input tables (eg: c/o) • Different depts. Record different addresses for same customer • Use of different units for time periods Labour Intensive Tasks • • • • Data entry Data checks Working on solving conflicts Allocating new codes • Solution = introduce as much automation as possible – – – – – SQL / SQL loader (Oracle) Custom conversion programmes to extract, modify and upload data Filtering Parsing (eg: excel) Staging areas for conversion in progress Data Utilities • ORACLE is king of data handling • Export: to transfer data between DBs – Extract both table structure and data content into dump file • Import: corresponding facility • SQL*loader automatic import from a variety of file formats into DB files – Needs a control file Control files: using SQLloader • Data transfers in and out of DB can be automated using the loader – Create a data file with the data(!) – Create a control file to guide the operation • Load creates two files – Log file – “bad transactions” file • Also a discard file if control file has selection criteria in it Example 1 – The Supplier File New supplier code to include city where firm is based Assignation of category based on amounts purchased OLD Sup code 4 digits Sup name Sup address City Phone Example 1 – The Supplier File New supplier code to include city where firm is based Assignation of category based on amounts purchased OLD Sup code 4 digits Sup name Sup address City Phone NEW Sup code 3 letters + 4 digits Sup name Sup address… Phone Cat 1,2,3 depending on total purchases last year Example 2 – New Cost Accounting Structure Maintenance department expenditure: 1 account => separate accounts for different production activities OLD Intervention code Desc. Date Labour Parts Total Example 2 – New Cost Accounting Structure Maintenance department expenditure: 1 account => separate accounts for different production activities OLD Intervention code Desc. Date Labour Parts Total NEW Intervention code Desc. Date labour Parts Total Account Example 3: Merging Files • Complete customer file based on Accounts and Sales and Shipping OLD (finance) CustID name address city account number credit limit balance discount rates rep_name OLD (sales) CustID* name address city sales_to_date OLD (Shipping) CustID** name address city Preferred haulier Example 4: Change of Business Practices • Payment by bank draft for international customers • Automatic payment into account for national customers • Payment direct into account for all customers Data Staging Area • • • • The construction site for the warehouse Required by most scenarios Connected to wide variety of sources Clean / aggregate / compute / validate data Operational system Extract Data staging area Transform Transport (Load) Warehouse Remote Staging Model Data staging area within the warehouse environment Warehouse environment Oper. envt. Operational system Extract, transform, transport Data staging area Transform Transport (Load) Warehouse Data staging area in its own environment, avoiding negative impact on the warehouse environment Staging envt. Oper. envt. Operational system Extract, transform, transport Data staging area Transform Warehouse envt. Transport (Load) Warehouse Onsite Staging Model Data staging area within the operational environment, possibly affecting the operational system WH envt. Operational environment Operational system Extract Data staging area Transform Transport (Load) Warehouse Data Mart • A subset of a data warehouse that supports the requirements of a particular department or business function. • Characteristics include: – Do not normally contain detailed operational data unlike data warehouses. – May contain certain levels of aggregation Dependent Data Mart Flat Files Operational Systems Marketing Marketing Sales Finance Human Resources Data Warehouse Sales Finance Data Marts External Data Independent Data Mart Operational Systems Flat Files Sales or Marketing External Data Reasons for Creating a Data Mart • To give users more flexible access to the data they need to analyse most often. • To provide data in a form that matches the specific needs of a group of users • To improve end-user response time. • Potential users of a data mart are clearly defined and can be targeted for support Why Create a Data Mart? • To provide appropriately structured data as dictated by the requirements of the end-user access tools. • Building a data mart is simpler (and much quicker) compared with establishing a corporate data warehouse. • The cost of implementing data marts is far less than that required to establish a data warehouse.