Download Data in a Data Warehouse are Integrated

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Prognostics wikipedia , lookup

Intelligent maintenance system wikipedia , lookup

Transcript
The Data Warehouse
“A data warehouse is a subject-oriented,
integrated, time-variant, and nonvolatile collection
of “all” an organisation’s data in support of
management’s decision making process.”
– Data warehouses developed because E.G.:
– if you want to ask “How much does this customer
owe?” then the sales database is probably the one to
use. However if you want to ask “Was this ad
campaign more successful than that one?”, you require
data from more disparate sources Other sources e.g.
production, marketing etc.
Organizational Data Flow
and Data Storage
Components
Characteristics of a Data Warehouse
• Subject oriented – organized based on use;
e.g. business process
• Integrated – inconsistencies removed
• Nonvolatile – stored in read-only format
• Time variant – data are normally time series
• Summarized – in decision-usable format
• Large volume – data sets are quite large
• Non normalized – often redundant:
Non-volatile and non normalised Data
• Data in the warehouse is not updated in
real-time but is refreshed from
operational systems on a regular basis.
• New data is always added as a
supplement to the database, rather than a
replacement.
• Data is non –normalised this is achieved
using the star flake and similar
schema’s…
© Pearson Education Limited 1995, 2005
Dependent
Data Mart
External
Data
Extract/Summarize Data
ETL Routine
Operational
Database(s)
(Extract/Transform/Load)
Data
Warehouse
Independent
Data Mart
Decision Support System
Report
A data warehouse process model
Data Warehousing Architecture
• Fusion and cleansing: sourcing,
acquisition, cleanup and transformation of
data
– Implementing data warehouses involves
extracting data from operational systems
including legacy systems and putting it into a
suitable format.
– These tools perform all the conversions,
summarisations, key changes, structural
changes, and condensations needed to
transform disparate data into information can be
used by decision support tools
–
Data in a Data Warehouse are
Integrated
Meta Data
• A key concept behind D.W. is Meta Data.
– Meta data is data about the data (which has come from
the data sources) and shows what data is contained in
the DW, where it came from, and what changes have
been made to it.
• The metadata are essential ingredients in the transformation of raw
data into knowledge. They are the “keys” that allow us to handle the
raw data.
– For example, a line in a sales database may contain:
1023 K596 111.21
– This is mostly meaningless until we consult the metadata (in the data
directory) that tells us it was store number 1023, product K596 and sales
of $111.21.
Data marts
• A data mart is a data store that is subsidiary to a data
warehouse of integrated data.
• The data mart is directed at a partition of data (subject area)
that is created for the use of a dedicated group of users and is
sometimes termed a “subject warehouse”
• The data mart might be a set of denormalised, summarised
or aggregated data that can be placed on the data warehouse
database or more often placed on a separate physical store.
• Data marts can be “dependent data marts” when the data is
sourced from the data warehouse.
• Independent data marts represent fragmented solutions to a
range of business problems in the enterprise, however, such
a concept should not be deployed as it doesn’t have the “data
integration” concept that’s associated with data warehouses.
Data Warehousing Typology
– THE D.W. can be at single location i.e. a central data
warehouse
– or
– The collection of data is replicated around multiple
locations. This means users have a local copy of the
data warehouse. This can improve query run-times, and
reduce communications overheads. Distributed Data
warehouse (Note: The principles associated with
distributed database equally apply to Distributed Data
warehouses ) .
Data Warehousing Design
DT211/4
Designing Data Warehouses
• Need to find answers for questions such
as:
– Which user requirements are most important?
– which data should be considered first….
• The database component of a data
warehouse is described using a technique
called dimensionality modelling.
12
Dimensionality modeling
• A logical design technique that aims to
present the data in a standard, intuitive
form that allows for high-performance
access
• Every dimensional model (DM) is
composed of one table with a composite
primary key, called the fact table, and a set
of smaller tables called dimension tables.
13
Fact and dimension tables for
each
business
process
of
DreamHome
14
ER model of property sales business
process of DreamHome
15
Star schema for property sales of
DreamHome
16
Dimensionality modeling
• Star schema is a logical structure that has a fact table
containing factual data in the centre, surrounded by
dimension tables containing reference data, which can
be denormalised.
• For example: dimension tables (propertyfor sale,
client, branch and staff) all have city region and county
repeated.
17
Dimensionality modeling
• Star schemas can be used to speed up query
performance by denormalizing reference
information into a single dimension table.
• For example: dimension tables (propertyfor
sale, client, branch and staff) all have city
region and county repeated.
•
18
Database Design Methodology for
Data Warehouses
• ‘Methodology’ includes following steps:
–
–
–
–
–
Choosing the process
Choosing the facts and dimensions
Choosing the facts
Storing pre-calculations in the fact table
19
Choosing the process
• The process (function) refers to the subject
matter of a particular data warehouse: to
answer the most commercially important
business questions .
• Identify the discrete business processes; For
example: property sales.
20
ER model of property sales business
process of DreamHome
21
Choosing the facts
• Decide what a record of the fact table is to
represents: e.g. Property sales.
• Facts should be numeric and additive.
• Identify dimensions of the fact table. The
contents for the fact table also determines the
contents for each dimension table.
• Dimensions set the context for asking questions
about the facts in the fact table; Clientbuyer:
clientno., client name, city, region, county.
22
Star schema for property sales of
DreamHome
23