Download Typical architecture of a data warehouse

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Prognostics wikipedia , lookup

Intelligent maintenance system wikipedia , lookup

Transcript
Sachin Goel (68)
Manav Mudgal (69)
Piyush Samsukha (76)
Rachit Singhal (82)
Richa Somvanshi (85)
Sahar( )
 Outline
 Data Warehousing
 Warehouse Architecture
 Its components
 Data flows
 Data marts
 Benefits of data warehousing
 Disadvantages of datawarehousing
 Case Study
 data
What
is
data
warehousing?
warehousing is subject-oriented, integrated,
time-variant, and non-volatile collection of data in
support of management’s decision-making process.
 a data warehouse is data management and data
analysis
 data webhouse is a distributed data warehouse that
is implement over the web with no central data
repository
 goal: is to integrate enterprise wide corporate data
into a single reository from which users can easily
run queries
 What is data warehousing?
 Subject-orientedWH is organized around the major subjects of the
enterprise..rather than the major application areas.. This is reflected in the
need to store decision-support data rather than application-oriented data
 Integratedbecause the source data come together from different enterprisewide applications systems. The source data is often inconsistent using..The
integrated data source must be made consistent to present a unified view of the
data to the users
 Time-variantthe source data in the WH is only accurate and valid at some
point in time or over some time interval. The time-variance of the data
warehouse is also shown in the extended time that the data is held, the implicit
or explicit association of time with all data, and the fact that the data
represents a series of snapshots
 Non-volatiledata is not update in real time but is refresh from OS on a
regular basis. New data is always added as a supplement to DB, rather than
replacement. The DB continually absorbs this new data, incrementally
integrating it with previous data
 The architecture
Reporting, query,
application development,
and EIS(executive
information system) tools
Operational
data source1
High
summarized data
Meta-data
Query Manager
Operational
data source 2
Lightly
summarized
data
Load Manager
Operational
data source n
Operational
data store (ods)
DBMS
Detailed data
OLAP(online
analytical processing) tools
Warehouse Manager
Operational data store (ODS)
Data mining
Archive/backup
data
End-user
access tools
Typical architecture of a data warehouse
 The main components
 Operational data sourcesThe sources of data for the data
warehouse is supplied from:
 The data from the mainframe systems in the traditional network and
hierarchical format.
 Data can also come from the relational DBMS like Oracle, Informix.
 In addition to these internal data, operational data also includes external
data obtained from commercial databases and databases associated with
supplier and customers.
 Operational datastore(ODS)is a repository of current and
integrated operational data used for analysis. It is often structured and
supplied with data in the same way as the data warehouse, but may in fact
simply act as a staging area for data to be moved into the warehouse
 The main components
 Load manageralso called the frontend component, it performs all the
operations associated with the extraction and loading of data into the
warehouse. These operations include simple transformations of the data to
prepare the data for entry into the warehouse
 Warehouse managerperforms all the operations associated with the
management of the data in the warehouse. The operations performed by
warehouse manager include:
 Analysis of data to ensure consistency
 Transformation and merging the source data from temporary storage into
data warehouse tables
 Create indexes and views on the base table.
 Generation of aggregation
 Backing up and archiving of data
 The main components
 Query manageralso called backend component, it performs all the
operations associated with the management of user queries. The operations
performed by this component include directing queries to the appropriate
tables and scheduling the execution of queries
 Detailed, lightly and lightly summarized data,archive/backup
data
 Meta-data
 End-user access toolscan be categorized into five main groups: data
reporting and query tools, application development tools, executive
information system (EIS) tools, online analytical processing (OLAP) tools,
and data mining tools
 Data flows
 Inflow- The processes associated with the extraction, cleansing, and loading of the
data from the source systems into the data warehouse.
 upflow- The process associated with adding value to the data in the warehouse
through summarizing, packaging , and distribution of the data
 downflow- The processes associated with archiving and backing-up of data in the
warehouse
 outflow- The process associated with making the data availabe to the end-users
 Meta-flow- The processes associated with the management of the meta-data
Reporting, query,application
development, and EIS (executive
information system) tools
Warehouse Manager
Operational
data source1
Meta-flow
High
summarized data
Meta-data
Inflow
Outflow
Lightly
summarized
data
Load
Manager
Operational
data source n
Operational
data store (ods)
Upflow
Query Manager
OLAP (online
analytical processing)
tools
DBMS
Detailed data
Warehouse Manager
Data mining tools
Downflow
Archive/backup
data
Information flows of a data warehouse
End-user
access tools
 Data mart
 data mart a subset of a data warehouse that supports the
requirements of particular department or business function
 The characteristics that differentiate data marts and data
warehouses include:
 a data mart focuses on only the requirements of users
associated with one department or business function.
 data marts do not normally contain detailed operational data,
unlike data warehouses
 as data marts contain less data compared with data warehouses,
data marts are more easily understood and navigated.
Warehouse Manager
Operational
data source1
Operational
data source 2
Lightly
summarized
data
Load
Manager
Operational
data source n
Operational
data store (ods)
Reporting, query,application development,
and EIS(executive information system) tools
High
summarized data
Meta-data
Query
Manage
Detailed data
DBMS
OLAP(online
analytical processing) tools
Warehouse Manager
Data mining
(First Tier)
(Third Tier)
Operational data store (ODS)
End-user
access tools
Archive/backup
data
Data Mart
summarized
data(Relational database)
Summarized data
(Multi-dimension database)
Typical data warehouse adn data mart architecture
(Second Tier)
Reasons for creating a data mart
 To give users access to the data they need to analyze most often
 To provide data in a form that matches the collective view of the data
by a group of users in a department or business function
 To improve end-user response time due to the reduction in the volume
of data to be accessed
 To provide appropriately structured data as ditated by the requirements
of end-user access tools
 Normally use less data so tasks such as data cleansing, loading,
transformation, and integration are far easier, and hence implementing
and setting up a data mart is simpler than establishing a corporate data
warehouse
 The cost of implementing data marts is normally less than that
required to establish a data warehouse
 The potential users of a data mart are more clearly defined and can be
more easily targeted to obtain support for a data mart project rather
than a corporate data warehouse project
 The benefits of data
warehousing
 The potential benefits of data warehousing are high
returns on investment.
 substantial competitive advantage.
 increased productivity of corporate decision-makers.
 Data warehouses facilitate decision support system
applications such as trend reports (e.g., the items with the
most sales in a particular area within the last two years),
exception reports, and reports that show actual
performance versus goals.
Disadvantages of warehousing
 Data warehouses are not the optimal environment for unstructured data.
 Because data must be extracted, transformed and loaded into the warehouse,
there is an element of latency in data warehouse data.
 Over their life, data warehouses can have high costs. Maintenance costs are
high.
 Data warehouses can get outdated relatively quickly. There is a cost of
delivering suboptimal information to the organization.
 There is often a fine line between data warehouses and operational systems.
Duplicate, expensive functionality may be developed. Or, functionality may
be developed in the data warehouse that, in retrospect, should have been
developed in the operational systems and vice versa.
TOSHIBA Case study