Download Distributed Data warehouse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Database model wikipedia , lookup

Functional Database Model wikipedia , lookup

Transcript
The Data Warehouse
“A data warehouse is a subject-oriented,
integrated, time-variant, and nonvolatile collection
of “all” an organisation’s data in support of
management’s decision making process.”
– Data warehouses developed because E.G.:
– if you want to ask “How much does this customer
owe?” then the sales database is probably the one to
use. However if you want to ask “Was this ad
campaign more successful than that one?”, you require
data from more disparate sources Other sources e.g.
production, marketing etc.
Characteristics of a Data Warehouse
•
•
•
•
•
•
•
Subject oriented – products, customers…
Integrated – inconsistencies removed
Nonvolatile – stored in read-only format
Time variant – data are normally time series
Summarized – in decision-usable format
Large volume – data sets are quite large
Non normalized – often redundant: star
flake schema:
Dependent
Data Mart
External
Data
Extract/Summarize Data
ETL Routine
Operational
Database(s)
(Extract/Transform/Load)
Data
Warehouse
Independent
Data Mart
Decision Support System
Report
A data warehouse process model
Meta Data
• A key concept behind D.W. is Meta Data.
– Meta data is data about the data (which has come from
the data sources) and shows what data is contained in
the DW, where it came from, and what changes have
been made to it.
• The metadata are essential ingredients in the transformation of raw
data into knowledge. They are the “keys” that allow us to handle the
raw data.
– For example, a line in a sales database may contain:
1023 K596 111.21
– This is mostly meaningless until we consult the metadata (in the data
directory) that tells us it was store number 1023, product K596 and sales
of $111.21.
Meta Data Answers Questions for Users
of the Data Warehouse
• How do I find the data
I need?
• What is the original
source of the data?
• How was this
summarization
created?
• What queries are
available to access the
data?
 How
have business
definitions and terms
changed over time?
 How do product lines
vary across
organizations?
 What business
assumptions have
been made?
Data marts
• A data mart is a data store that is subsidiary to a data
warehouse of integrated data.
• The data mart is directed at a partition of data (subject area)
that is created for the use of a dedicated group of users and is
sometimes termed a “subject warehouse”
• The data mart might be a set of denormalised, summarised
or aggregated data that can be placed on the data warehouse
database or more often placed on a separate physical store.
• Data marts can be “dependent data marts” when the data is
sourced from the data warehouse.
• Independent data marts represent fragmented solutions to a
range of business problems in the enterprise, however, such
a concept should not be deployed as it doesn’t have the “data
integration” concept that’s associated with data warehouses.
Independent Data marts
• However, such marts are not necessarly all
bad.
• Often a valid solution to a pressing business
problem:
– Extremely urgent user requirements
– The absence of a budget for a full data
warehouse
– The decentralisation of business units
Data Warehousing Architecture
• Access Tools
– The principal purpose of the data warehouse is
to provide information for strategic decision
making.
– The main Decision tools used to achieve this
objective are:
• Data mining tools
• On-line analytical processing tools
• Decision support systems / Executive information
system tools
Data Warehousing Typology
– THE D.W. can be at single location i.e. a central data
warehouse
– The collection of data is replicated around multiple
locations. This means users have a local copy of the
data warehouse. This can improve query run-times, and
reduce communications overheads. Distributed Data
warehouse (Note: The principles associated with
distributed database equally apply to Distributed Data
warehouses, however, the static nature of the data needs
to be factored in to the design process ) .
Data Warehouse Construction
Tips
• Accept that your first try will require revision
• Examine the data: What formats and specific data are
needed to support your application?
• Clean up the data before using it in the warehouse
• Build a prototype mini-data warehouse as a learning
experience and revise strategies as necessary
• Plan on more users than anticipated wanting to use the
warehouse
• Keep storage requirements constantly in mind
Sample type question
• Discuss how D.W. can play’s key role in
strategic decision making.