Download Data Warehousing

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Database model wikipedia , lookup

Functional Database Model wikipedia , lookup

Transcript
Data warehousing provides architectures & tools for business
executives to systematically organize, understand & use their data to
make strategic decisions. A data warehouse is a collection of
integrated databases designed to support a DSS.
Key Features:
 Subject Oriented: Data warehouse are
designed to help you analyze data. For e.g. To learn
more about your company’s sales data, you can build a
warehouse that concentrate on sales.
Integrated: Integration is closely related to subject
orientation. A data warehouse is usually constructed by
integrating multiple heterogeneous sources, such as
database, flat files &online transaction action records.
They must resolve problems like naming conflicts &
inconsistencies.
Time Variant: Data are stored to provide information from
historical perspective(e.g. The past 5-10 years). It varies from
time to time.
Non Volatile: This means, once entered into warehouse,
data should not change.
Summarized : Operational data are mapped into a
decision-usable format.
Metadata: Data about data are stored. In a data warehouse,
metadata describe the contents of a data warehouse and the
manner of its use.
Need for Data Warehousing
Industry has huge amount of
operational data.
Knowledge worker wants to turn
this data into useful information.
This information is used by them to
support strategic decision making.
Comparison between OLTP & OLAP systems.
S. No.
OLTP
OLAP
1.
Characteristics
Operational
Processing
Informational
Processing
2.
Orientation
Transaction
Analysis
3.
User
Clerk , DBA,
Database proff.
Knowledge
worker(e.g.
Manager, etc)
4.
Function
Day to Day
operations.
Decision
support
5.
Data
Current, up-to-date
Historical,
maintained
over time.
6.
DB Size
100 MB to GB
100 GB to TB
6.
Priority
High
High
performance,
flexibilty
High availability
7.
Unit of Work
Short, Simple
transaction
Complex
query
8.
Access
Read/Write
Mostly Read
9.
Focus
Data in
Info out
10.
Example
Purchasing,
banking,etc
Budgeting,
sales
forcasting,etc
ADVANTAGES & DISADVANTAGES OF DATA WAREHOUSES:
ADVANTAGES:
High investment: Implementation of DW by an organisation requires a
huge investment from Rs.10 lac to Rs.50 lac.
Increased productivity of corporate decision-makers: DW improves
the productivity of corporate decision-makers by creating an integrated database
of consistent, subject-oriented, historical data. By transforming data into
meaningful information a DW allows business managers to perform more
accurate & consistent analysis.
Most Cost-effective decision making: DW helps to reduce overall cost of
the product by reducing the number of channels.
DW provides retrieval of data without slowing down operational system.
DW facilitates DSS applications such as trend reports,exception reports &
reports that show actual performance versus goals.
DISADVANTAGES:
Underestimation of resources of data loading: Sometimes we
underestimate the time reqd. To extract, clean & load the data into DW.
It may take significant proportion of the total development time,
although some tools are there which are used to reduce the time & effort
spent on this process.
Increased end user demands: After satisfying the demands, the user
request’s increases. This is b’coz of Increasing awareness of the users on
capability & value of DW.
Data Homogenization: The concept of DW deals with similarity of
data formats b/w different data sources. Thus, results in to lose of some
imp. Value of the data.
High Maintenance: DW is usually not static and hav high cost. : DW
are high maintenance systems. Any reorganisation of the business
processes & the source systems may affect the DW & result in high
maintenance cost.
The three fundamental components that are supported by DW are:
 Load Manager.
Warehouse Manager.
Data Access Manager.
LOAD MANAGER: The components of DW is responsible for collection
of data from operational systems & convert them into usable form for
the user. This component is responsible for importing & exporting data
from operational systems. It performs the following task:
• Identification of Data.
•Validation of Data about accuracy.
•Extraction of Data from original source
•Cleansing of data by eliminating meaningless values & making iit
usable.
•Data formatting.
•Data standardisation by getting them into consistent from.
Warehouse Manager:
The warehouse manager is the centre of DW system & is the DW itself. It is
a large, physical database that holds a vast amount of information from a
wide variety of sources. The data within DW is organised such that it
becomes easy to find, use & update frequently from its sources.
Query Manager:
Query Manager component provides the end-users with access to the stored
warehouse info. Through the use of specialised end-user tools. Tools like
Query & reporting, OLAP, graphical & geographical info. Systems.
3-TIER DATAWAREHOUSE ARCHITECTURE:-
Data warehouse adopt a three tier architecture, these are:1.Bottom Tier(Data warehouse server)
2.Middle Tier(OLAP server)
3. Top Tier(Front end tools).
BOTTOM TIER: (how data is extracted from
sources)
It is a warehouse database server
Data is fed using Back end tools and utilities.
Data extracted using programs called gateways(ODBC,JDBC)
It also contains Meta data repository.
MIDDLE TIER:
The middle tier is an OLAP server that is typically implemented
using either
A relational OLAP (ROLAP) model, that is, an extended relational
DBMS that maps operations on multidimensional data to standard
relational operations;
A multidimensional OLAP (MOLAP) model, that is, a specialpurpose server that directly implements multidimensional data and
operations.
TOP TIER
The top tier is a front-end client layer, which contains query and
reporting tools, analysis tools, and/or data mining tools.
DATAWAREHOUSE BACK-ENDTOOLS AND UTILITIES
Data warehouse systems use back-end tools and utilities to populate and
refresh their data . These tools and utilities include the following
functions:
Data extraction
which typically gathers data from multiple, heterogeneous, and external
sources.
Data cleaning
which detects errors in the data and rectifies them when possible.
Data transformation
which converts data from legacy or host format to warehouse format
Load
which sorts, summarizes, consolidates, computes views, checks integrity,
and builds indices and partitions
Refresh
which propagates the updates from the data sources to the warehouse
Online Analysis Processing(OLAP):
It enables analysts, managers and executives to gain insight
information data through fast, consistent, interactive access to a
wide variety of possible views of information that has been
transformed from raw data to reflect the real dimensionality of the
enterprise as understood by the user.
Product
Data
Warehouse
Time
Multidimensional
Data
 Sales volume as a function of product, month, and
region
Dimensions: Product, Location, Time
Hierarchical summarization paths
Industry Region
Year
Product
Category Country Quarter
Product
City
Office
Month
Month Week
Day
OLAP Operations
Drill down: It navigates from less detailed data to more detail data. It
is reverse of roll up. This adds more detail to given data.
Product
Category e.g Electrical Appliance
Sub Category e.g Kitchen
Product e.g Toaster
Time
OLAP Operations:
Roll Up: Performs Aggregation on data cube,either by climbing up or
by dimension reduction.
Product
Sub Category e.g Kitchen
Product e.g Toaster
Time
OLAP Operations:
Slice and Dice: The slice operation performs a selection on one
dimension of a given cube, resulting in sub cube. The dice defines a
sub cube by performing a selection on two or more dimension.
Product
Product=Toaster
Time
Time
OLAP Operations
Pivot: is a visualization operation that rotates the data axes in view in order
to provide an alternative presentation of data.
Product
Product
Time
Region
DATA MART is generalised term used to describe DW environments that are
somehow smaller than others. Data Mart term often used to describe small,
single purpose mini DW.
“A subset of DW that support the requirements of a particular
department or business function” is known as DATA MART.”
It is normally in the form of summary data relating to a particular
department or business function. A DW may be constructed as a collection of
a subset of DATA MART’s. Usually implemented on low cost departmental
servers that are UNIX, Windows/NT based. Depending on sources of data,
DM’s can be categorised as:
Independent DM’s: sourced from data captured from one or more
operational systems or external information providers.
Dependent DM’s: sourced directly from enterprise Data warehouse.
Issues associated with development and management of data
marts:
Data Mart Functionality: successfully provide analysis using
OLAP and other data mining tools.
Data mart size: User expects faster response time from data
marts than from data warehouses.
Data mart load performances: A data mart has to balance 2
critical components ie. End user response time And data loading
performance.
Users access to data in multiple data marts.
Data mart internet/intranet access: this technology offers
users low cost access to data marts and DW’s using web browsers.
Data mart installation: ‘Data mart in a box’ & many products
are available that can provide a low cost data marts for an
organisation.
Data Warehousing Tools:
Data Warehouse
SQL Server 2000 DTS
Oracle 8i Warehouse Builder
OLAP tools
SQL Server Analysis Services
Oracle Express Server
Reporting tools
MS Excel Pivot Chart
VB Applications