Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support of management’s decision making process.” – Data warehouses developed because E.G.: – if you want to ask “How much does this customer owe?” then the sales database is probably the one to use. However if you want to ask “Was this ad campaign more successful than that one?”, you require data from more disparate sources Other sources e.g. production, marketing etc. Characteristics of a Data Warehouse • • • • • • • Subject oriented – products, customers… Integrated – inconsistencies removed Nonvolatile – stored in read-only format Time variant – data are normally time series Summarized – in decision-usable format Large volume – data sets are quite large Non normalized – often redundant: star flake schema: Dependent Data Mart External Data Extract/Summarize Data ETL Routine Operational Database(s) (Extract/Transform/Load) Data Warehouse Independent Data Mart Decision Support System Report A data warehouse process model Meta Data • A key concept behind D.W. is Meta Data. – Meta data is data about the data (which has come from the data sources) and shows what data is contained in the DW, where it came from, and what changes have been made to it. • The metadata are essential ingredients in the transformation of raw data into knowledge. They are the “keys” that allow us to handle the raw data. – For example, a line in a sales database may contain: 1023 K596 111.21 – This is mostly meaningless until we consult the metadata (in the data directory) that tells us it was store number 1023, product K596 and sales of $111.21. Meta Data Answers Questions for Users of the Data Warehouse • How do I find the data I need? • What is the original source of the data? • How was this summarization created? • What queries are available to access the data? How have business definitions and terms changed over time? How do product lines vary across organizations? What business assumptions have been made? Data marts • A data mart is a data store that is subsidiary to a data warehouse of integrated data. • The data mart is directed at a partition of data (subject area) that is created for the use of a dedicated group of users and is sometimes termed a “subject warehouse” • The data mart might be a set of denormalised, summarised or aggregated data that can be placed on the data warehouse database or more often placed on a separate physical store. • Data marts can be “dependent data marts” when the data is sourced from the data warehouse. • Independent data marts represent fragmented solutions to a range of business problems in the enterprise, however, such a concept should not be deployed as it doesn’t have the “data integration” concept that’s associated with data warehouses. Independent Data marts • However, such marts are not necessarly all bad. • Often a valid solution to a pressing business problem: – Extremely urgent user requirements – The absence of a budget for a full data warehouse – The decentralisation of business units Data Warehousing Architecture • Access Tools – The principal purpose of the data warehouse is to provide information for strategic decision making. – The main Decision tools used to achieve this objective are: • Data mining tools • On-line analytical processing tools • Decision support systems / Executive information system tools Data Warehousing Typology – THE D.W. can be at single location i.e. a central data warehouse – The collection of data is replicated around multiple locations. This means users have a local copy of the data warehouse. This can improve query run-times, and reduce communications overheads. Distributed Data warehouse (Note: The principles associated with distributed database equally apply to Distributed Data warehouses, however, the static nature of the data needs to be factored in to the design process ) . Data Warehouse Construction Tips • Accept that your first try will require revision • Examine the data: What formats and specific data are needed to support your application? • Clean up the data before using it in the warehouse • Build a prototype mini-data warehouse as a learning experience and revise strategies as necessary • Plan on more users than anticipated wanting to use the warehouse • Keep storage requirements constantly in mind Sample type question • Discuss how D.W. can play’s key role in strategic decision making.