Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of “all” an organisation’s data in support of management’s decision making process.” – Data warehouses developed because E.G.: – if you want to ask “How much does this customer owe?” then the sales database is probably the one to use. However if you want to ask “Was this ad campaign more successful than that one?”, you require data from more disparate sources Other sources e.g. production, marketing etc. Organizational Data Flow and Data Storage Components Characteristics of a Data Warehouse • Subject oriented – organized based on use; e.g. business process • Integrated – inconsistencies removed • Nonvolatile – stored in read-only format • Time variant – data are normally time series • Summarized – in decision-usable format • Large volume – data sets are quite large • Non normalized – often redundant: Non-volatile and non normalised Data • Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. • New data is always added as a supplement to the database, rather than a replacement. • Data is non –normalised this is achieved using the star flake and similar schema’s… © Pearson Education Limited 1995, 2005 Dependent Data Mart External Data Extract/Summarize Data ETL Routine Operational Database(s) (Extract/Transform/Load) Data Warehouse Independent Data Mart Decision Support System Report A data warehouse process model Data Warehousing Architecture • Fusion and cleansing: sourcing, acquisition, cleanup and transformation of data – Implementing data warehouses involves extracting data from operational systems including legacy systems and putting it into a suitable format. – These tools perform all the conversions, summarisations, key changes, structural changes, and condensations needed to transform disparate data into information can be used by decision support tools – Data in a Data Warehouse are Integrated Meta Data • A key concept behind D.W. is Meta Data. – Meta data is data about the data (which has come from the data sources) and shows what data is contained in the DW, where it came from, and what changes have been made to it. • The metadata are essential ingredients in the transformation of raw data into knowledge. They are the “keys” that allow us to handle the raw data. – For example, a line in a sales database may contain: 1023 K596 111.21 – This is mostly meaningless until we consult the metadata (in the data directory) that tells us it was store number 1023, product K596 and sales of $111.21. Data marts • A data mart is a data store that is subsidiary to a data warehouse of integrated data. • The data mart is directed at a partition of data (subject area) that is created for the use of a dedicated group of users and is sometimes termed a “subject warehouse” • The data mart might be a set of denormalised, summarised or aggregated data that can be placed on the data warehouse database or more often placed on a separate physical store. • Data marts can be “dependent data marts” when the data is sourced from the data warehouse. • Independent data marts represent fragmented solutions to a range of business problems in the enterprise, however, such a concept should not be deployed as it doesn’t have the “data integration” concept that’s associated with data warehouses. Data Warehousing Typology – THE D.W. can be at single location i.e. a central data warehouse – or – The collection of data is replicated around multiple locations. This means users have a local copy of the data warehouse. This can improve query run-times, and reduce communications overheads. Distributed Data warehouse (Note: The principles associated with distributed database equally apply to Distributed Data warehouses ) . Data Warehousing Design DT211/4 Designing Data Warehouses • Need to find answers for questions such as: – Which user requirements are most important? – which data should be considered first…. • The database component of a data warehouse is described using a technique called dimensionality modelling. 12 Dimensionality modeling • A logical design technique that aims to present the data in a standard, intuitive form that allows for high-performance access • Every dimensional model (DM) is composed of one table with a composite primary key, called the fact table, and a set of smaller tables called dimension tables. 13 Fact and dimension tables for each business process of DreamHome 14 ER model of property sales business process of DreamHome 15 Star schema for property sales of DreamHome 16 Dimensionality modeling • Star schema is a logical structure that has a fact table containing factual data in the centre, surrounded by dimension tables containing reference data, which can be denormalised. • For example: dimension tables (propertyfor sale, client, branch and staff) all have city region and county repeated. 17 Dimensionality modeling • Star schemas can be used to speed up query performance by denormalizing reference information into a single dimension table. • For example: dimension tables (propertyfor sale, client, branch and staff) all have city region and county repeated. • 18 Database Design Methodology for Data Warehouses • ‘Methodology’ includes following steps: – – – – – Choosing the process Choosing the facts and dimensions Choosing the facts Storing pre-calculations in the fact table 19 Choosing the process • The process (function) refers to the subject matter of a particular data warehouse: to answer the most commercially important business questions . • Identify the discrete business processes; For example: property sales. 20 ER model of property sales business process of DreamHome 21 Choosing the facts • Decide what a record of the fact table is to represents: e.g. Property sales. • Facts should be numeric and additive. • Identify dimensions of the fact table. The contents for the fact table also determines the contents for each dimension table. • Dimensions set the context for asking questions about the facts in the fact table; Clientbuyer: clientno., client name, city, region, county. 22 Star schema for property sales of DreamHome 23