Download Data Warehousing components

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Data Warehousing components
Overall architecture
Figure 4-1 Data Warehouse Architecture
Data warehouse database
• The central data warehouse database is a cornerstone of
data warehousing environment
• These approaches include the following:
– Parallel relational database designs that require a parallel
computing platform
– An innovative approach to speed up a traditional RDBMS by using
new index structures to bypass relational table scans
– Multidimensional database (MDDBs) that are based on proprietary
database technology or implemented using already familiar
RDBMS. Multidimensional database are designed to overcome
any limitations placed on the warehouse by the nature of the
relational data model
Sourcing, Acquisition, Cleanup
and Transformation Tools
• A significant portion of the data warehouse implementation
effort is spent extracting data from operational systems and
putting it in a format suitable for informational
applications that will run off the data warehouse
• The data sourcing, cleanup, transformation, and migration
tools perform all of the conversions, summarizations, key
changes, structural changes, and condensations needed to
transform disparate data into information that can be used
by the decision support tool
Sourcing, Acquisition, Cleanup
and Transformation Tools
• The functionality includes:
–
–
–
–
–
Removing unwanted data from operational databases
Converting to common data names and definitions
Calculating summaries and derived data
Establishing defaults for missing data
Accommodating source data definition changes
• The data sourcing, cleanup, extract, transformation and
migration tools have to deal with some significant issues,
as follows:
– Database heterogeneity.
– Data heterogeneity.
Metadata
• Metadata is data about data that describes the data
warehouse.
• It is used for building, maintaining, managing, and
using the data warehouse.
• Metadata can be classified into the following:
– Technical metadata
– Business metadata
– Data warehouse operational information such as data
history (snapshots, versions), ownership, extract audit
trail, usage data
Access Tools
• The principal purpose of data warehouse is to provide
information to business users for strategic decision
making.
• These users interact with the data warehouse using frontend tool.
• Many of these tools require an information specialist, a
domain expert, who can analyze the information and can
interact with the data warehousing environment in order to
reach meaningful conclusions.
• This is especially true for data mining tools when defining
the problem, configuring the tool, and analyzing the
results.
Tool Taxonomy
• The end user tools area spans a number of components.
For example, all end user tools use metadata definitions to
obtain access to data stored in the warehouse, and some of
these tools may employ additional/ intermediary data
stores.
• These tools can be divide into five main groups:
–
–
–
–
–
Data Query and Reporting tools
Application Development tools
Executive Information System (EIS) Tools
Online analytical processing tools
Data mining tools
Data Mining tools
• Most organizations engage in data mining to do the same following:
– Discovering knowledge: segmentation, classification, association and
preferencing.
– Visualizing Data
– Correct data
• The strategic value of data mining is time-sensitive, especially in the
retail, marketing and finance sectors of the industry
• Using data mining to build predictive models in decision making has
several benefits.
– A model should explain why a particular decision was made
– Adjusting a model based on feedback from future decisions will lead to
experience accumulation and true organizational learning.
– Finally, a predictive model can be used to automate a decision step in a
larger process.
Data Marts
• The concept of the data mart is causing a lot of excitement and
attracting much attention in the data warehouse industry.
• In general, data marts are being presented as an inexpensive alternative
to a data warehouse, taking significantly less time and money to built
• The data mart is directed at a partition of data (often called as a subject
area) that is created for the use of a dedicated group of users.
• Unfortunately, the misleading statements about the simplicity and low
cost of data marts sometimes result in organizations or vendors
incorrectly positioning them as an alternative to the data warehouse.
• In summary, data marts present two problems: the problem of
scalability in situations where an initial small data mart grows quickly
in multiple dimensions, and the problem of data integration.
Data Warehouse
administration and management
• In summary, managing data warehouses includes the
following:
–
–
–
–
–
–
–
–
–
Security and priority management
Monitoring updates from multiple sources
Data quality checks
Managing and updating metadata
Auditing and reporting data warehouse usage and status
Purging data
Replicating, subsetting, distributing data
Backup and recovery
Data warehouse storage management (for example, capacity
planning; hierarchical storage management, or HSM; purging of
aged data)
Impact of the web
• Even a surface analysis of the information
technology industry indicates that the two most
pervasive themes in computing have been the
Internet and data warehousing.
• From a marketing perspective, a marriage of these
two giant technologies is a natural and
unavoidable event.
• The reason for these trends is simple: the
compelling advantages in using the Web for access
are magnified even further in a data warehouse
Impact of the web (cont’d)
• The intranet movement has resulted in a drastic decrease in
the capital intensity and the project expense of creating and
deploying applications on the web
• Today, corporations can setup a RDBMS server, DSS
server and Web server in a single location; build a decision
support application using standard tools; and then
immediately deploy to hundreds or even thousands of users
anywhere on the corporate intranet.
• Application maintenance, code upgrades, and security
privileges are now administered centrally.
• As an example: Sabre computer reservation system
Approaches to using the web
Figure 4-2 Web-enabled Information delivery
Design Options and Issues
• Issues: Web access offers some clear advantages over existing
architectures, but there are some very clear issues and concerns.
• These issues include the following:
–
–
–
–
–
Security
Performance
Statelessness
Functionality
Presentation
• Therefore, we can offer the following suggestions:
– Design your data warehouse very carefully
– Minimize the number and size of data transmission per access
– Use more server-based processing, including stored procedures and
server-side functions
– Ensure that the server is extensible, highly available, and that its workload
is balanced.
XML
• XML stands for eXtensible Markup Language
• XML should:
–
–
–
–
Easy to use over the Internet
Compatible with SGML
Capable to processed by easy-to-write programs
Legible and reasonably clear to users
• In addition to the XML standard, several auxiliary
standards are needed to complete the functionality of
XML. For example, XSL, Xlink, and Xpointer are among
the proposed standards that provide XML support for style
sheets, hyperlinks, and other features