Download (PPT, 382KB)

Document related concepts

Clusterpoint wikipedia , lookup

Big data wikipedia , lookup

Functional Database Model wikipedia , lookup

Database model wikipedia , lookup

Transcript
• Data Warehouse
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Conformed dimension
1
A conformed dimension is a set of data
attributes that have been physically
referenced in multiple database tables
using the same key value to refer to the
same structure, attributes, domain values,
definitions and concepts. A conformed
dimension cuts across many facts.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Conformed dimension
1
Dimensions are conformed when they are
either exactly the same (including keys) or
one is a perfect subset of the other. Most
important, the row headers produced in
two different answer sets from the same
conformed dimension(s) must be able to
match perfectly.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Conformed dimension
Conformed dimensions are either
identical or strict mathematical subsets
of the most granular, detailed
dimension
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Junk dimension
1
A junk dimension is a convenient grouping of
typically low-cardinality flags and indicators
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Junk dimension
1
One solution is to create a new dimension for
each of the remaining attributes, but due to
their nature, it could be necessary to create a
vast number of new dimensions resulting in a
fact table with a very large number of foreign
keys. The designer could also decide to leave
the remaining attributes in the fact table but
this could make the row length of the table
unnecessarily large if, for example, the
attributes is a long text string.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Junk dimension
The solution to this challenge is to
identify all the attributes and then put
them into one or several Junk
Dimensions
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Junk dimension
Junk dimensions are also appropriate for
placing attributes like non-generic comments
from the fact table. Such attributes might
consist of data from an optional comment
field when a customer places an order and as
a result will probably be blank in many cases.
Therefore the junk dimension should contain
a single row representing the blanks as a
surrogate key that will be used in the fact
table for every row returned with a blank
comment field
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Degenerate dimension
1
A degenerate dimension is a key, such as
a transaction number, invoice number,
ticket number, or bill-of-lading number,
that has no attributes and hence does not
join to an actual dimension table.
Degenerate dimensions are very
common when the grain of a fact table
represents a single transaction item or
line item because the degenerate
dimension represents the unique
identifier of the parent. Degenerate
dimensions often play an integral role in
the fact table's primary key.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Role-playing dimension
Dimensions are often recycled for
multiple applications within the same
database. For instance, a "Date"
dimension can be used for "Date of
Sale", as well as "Date of Delivery",
or "Date of Hire". This is often
referred to as a "role-playing
dimension".
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Use of ISO representation terms
1
When referencing data from a metadata
registry such as ISO/IEC 11179,
representation terms such as Indicator (a
boolean true/false value), Code (a set of
non-overlapping enumerated values) are
typically used as dimensions. For
example using the National Information
Exchange Model (NIEM) the data element
name would be PersonGenderCode and
the enumerated values would be male,
female and unknown.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Common patterns
1
One of the reasons to have date
dimensions is to place calendar
knowledge in the data warehouse
instead of hard coded in an
application
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Common patterns
Having both the date and time of day in
the same dimension, may easily result in a
huge dimension with millions of rows
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Common patterns
As a rule of thumb, time of day
dimension should only be created if
hierarchical groupings are needed or
if there are meaningful textual
descriptions for periods of time within
the day (ex. “evening rush” or “first
shift”).
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Common patterns
1
If the rows in a fact table are coming
from several timezones, it might be
useful to store date and time in both
local time and a standard time
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Teradata - Active enterprise data warehouse
Teradata Active Enterprise Data
Warehouse is the platform that runs the
Teradata Database, with added data
management tools and data mining
software.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Teradata - Active enterprise data warehouse
The data warehouse differentiates
between “hot and cold” data – meaning
that the warehouse puts data that is not
often used in a slower storage section.
As of October 2010, Teradata uses
Xeon 5600 processors for the server
nodes.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Teradata - Active enterprise data warehouse
1
Teradata Database 13.10 was announced
in 2010 as the company’s database
software for storing and processing data.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Teradata - Active enterprise data warehouse
Teradata Database 14 was sold as the
upgrade to 13.10 in 2011 and runs multiple
data warehouse workloads at the same
time. It includes column-store analyses.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Teradata - Active enterprise data warehouse
1
Teradata Integrated Analytics is a set
of tools for data analysis that resides
inside the data warehouse.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouse appliance
In computing, a data warehouse
appliance is a marketing term for an
integrated set of servers, storage,
Operating System(s), DBMS and
software specifically pre-installed and
pre-optimized for data warehousing
(DW). Alternatively, the term can also
apply to similar software-only systems
promoted as easy to install on specific
recommended hardware configurations
or preconfigured as a complete system.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouse appliance
DW appliances are marketed to for
middle-to-big data applications, most
commonly on data volumes in the
terabyte to petabyte range.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouse appliance - Technology
1
Most DW appliances use massively
parallel processing (MPP)
architectures to provide high query
performance and platform scalability
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouse appliance - History
1
MPP database architectures have a long
pedigree. Some consider Teradata's initial
product as the first DW appliance — or
Britton-Lee's. Teradata acquired Britton
Lee — renamed ShareBase — in June,
1990. Others disagree, considering
appliances as a "disruptive technology" for
Teradata
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouse appliance - History
Open source and commodity
computing components aided a reemergence of MPP data warehouse
appliances
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouse appliance - History
Other DW appliance vendors use
specialized hardware and advanced
software, instead of MPP architectures.
Netezza announced a "data appliance" in
2003, and used specialized fieldprogrammable gate array hardware.
Kickfire followed in 2008 with what they
called a dataflow "sql chip".
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouse appliance - History
In 2009 more DW
appliances emerged
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouse appliance - History
1
The market has also seen the emergence
of data-warehouse bundles where vendors
combine their hardware and database
software together as a data warehouse
platform
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Measure (data warehouse)
In a data warehouse, a measure is a
property on which calculations (e.g., sum,
count, average, minimum, maximum) can
be made.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Measure (data warehouse) - Example
1
For example if a retail store sold a
specific product, the quantity and
prices of each item sold could be
added or averaged to find the total
number of items sold and total or
average price of the goods.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Measure (data warehouse) - Use of ISO representation terms
1
When entering data into a metadata
registry such as ISO/IEC 11179,
representation terms such as Number,
Value and Measure are typically used
as measures.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse)
1
ISBN 0-471-20024-7, Page 356 So the
reason why aggregates can make such a
dramatic increase in the performance of
the data warehouse is the reduction of the
number of rows to be accessed when
responding to a query.Christopher
Adamson, Mastering Data Warehouse
Aggregates: Solutions for Star Schema
Performance, Wiley Publishing, Inc., 2006
ISBN 978-0-471-77709-0, Page 23
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse)
1
Ralph Kimball|Kimball, who is widely
regarded as one of the original
architects of data warehousing, says:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse)
The single most dramatic way to affect
performance in a large data warehouse is
to provide a proper set of aggregate
(summary) records that coexist with the
primary base records. Aggregates can
have a very significant effect on
performance, in some cases speeding
queries by a factor of one hundred or even
one thousand. No other means exist to
harvest such spectacular gains.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse)
Having aggregates and atomic data
increases the complexity of the
dimensional model. This complexity
should be transparent to the users of the
data warehouse, thus when a request is
made, the data warehouse should return
data from the table with the correct grain.
So when requests to the data warehouse
are made, aggregate navigator
functionality should be implemented, to
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse)
1
The best way to choose this subset and
decide which aggregations to build is
to monitor queries and design
aggregations to match query
patterns.Ralph Kimball et al., The Data
Warehouse Toolkit, Second Edition,
Wiley Publishing, Inc., 2008
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse) - Aggregate navigator
The aggregate navigation essentially
examines the query to see if it can be
answered using a smaller, aggregate
table.Ralph Kimball et al., The Data
Warehouse Toolkit, Second Edition, Wiley
Publishing, Inc., 2008
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse) - Aggregate navigator
1
Implementations of aggregate navigators can
be found in a range of technologies:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse) - Aggregate navigator
*Business intelligence|BI
application servers or query tools
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse) - Aggregate navigator
1
It is generally recommended to use either
of the first three technologies, since the
benefits in the latter case is restricted to a
single front end Business intelligence|BI
toolRalph Kimball et al., The Data
Warehouse Toolkit, Second Edition, Wiley
Publishing, Inc., 2008. ISBN 978-0-47014977-5, Page 354
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse) - Problems/challenges
1
*Since dimensional models only gains
from aggregates on large data sets, at
what size of the data sets should one
start considering using aggregates?
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Aggregate (Data Warehouse) - Problems/challenges
*Similarly, is a data warehouses
always handling data sets that are too
large for direct queries, or is it
sometimes a good idea to omit the
aggregate tables, when starting a new
data warehouse project. Thus will,
omitting aggregates in the first
iteration of building a new data
warehouse, make the structure of the
dimensional model simpler?
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse)
In computing, a 'data warehouse' or
'enterprise data warehouse' ('DW', 'DWH',
or 'EDW') is a database used for Business
reporting|reporting and data analysis. It is
a central repository of data which is
created by integrating data from one or
more disparate sources. Data warehouses
store current as well as historical data and
are used for creating trending reports for
senior management reporting such as
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse)
The data stored in the warehouse are
Uploading and downloading|uploaded
from the operational systems (such as
marketing, sales etc., shown in the figure
to the right). The data may pass through
an operational data store for additional
operations before they are used in the DW
for reporting.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse)
1
The integrated data are then moved to
yet another database, often called the
data warehouse database, where the
data is arranged into hierarchical
groups often called dimensions and
into facts and aggregate facts
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse)
This integrated data warehouse
architecture supports the drill down from
the aggregate data of the data warehouse
to the transactional data of the integrated
source data systems.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse)
1
A data mart is a small data warehouse
focused on a specific area of interest.
Data warehouses can be subdivided
into data marts for improved
performance and ease of use within
that area. Alternatively, an organization
can create one or more data marts as
first steps towards a larger and more
complex enterprise data warehouse.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse)
1
This definition of the data
warehouse focuses on
data storage
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
A data warehouse maintains a copy of
information from the source
transaction systems. This
architectural complexity provides the
opportunity to :
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
* Congregate data from multiple sources
into a single database so a single query
engine can be used to present data.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
* Mitigate the problem of database
isolation level lock contention in
transaction processing systems
caused by attempts to run large, long
running, analysis queries in
transaction processing databases.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
* Maintain data history, even if the
source transaction systems do not.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
* Integrate data from multiple source
systems, enabling a central view
across the enterprise. This benefit is
always valuable, but particularly so
when the organization has grown by
merger.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
* Improve data quality, by providing
consistent codes and descriptions,
flagging or even fixing bad data.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
* Provide a single common data model for all data of
interest regardless of the data's source.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
* Restructure the data so
that it makes sense to the
business users.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
1
* Restructure the data so that it delivers
excellent query performance, even for
complex analytic queries, without
impacting the operational systems.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Benefits of a data warehouse
* Add value to operational business
applications, notably customer relationship
management (CRM) systems.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
The environment for
data warehouses and
marts includes the
following:
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
1
* Source systems that
provide data to the
warehouse or mart;
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
1
*Data integration technology and processes
that are needed to prepare the data for use;
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
1
*Different architectures for storing data in an
organization's data warehouse or data marts;
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
1
*Different tools and
applications for the
variety of users;
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
1
*Metadata, data quality, and governance
processes must be in place to ensure
that the warehouse or mart meets its
purposes.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
In regards to source systems listed
above, Rainer states, “A common
source for the data in data warehouses
is the company’s operational
databases, which can be relational
databases”.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
1
Regarding data integration, Rainer states,
“It is necessary to extract data from source
systems, transform them, and load them
into a data mart or warehouse”.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
Rainer discusses storing data in an
organization’s data warehouse or data
marts. “There are a variety of possible
architectures to store decisionsupport data”.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
Metadata are data about data. “IT
personnel need information about
data sources; database, table, and
column names; refresh schedules; and
data usage measures“.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Generic data warehouse environment
1
Today, the most successful companies
are those that can respond quickly and
flexibly to market changes and
opportunities. A key to this response is
the effective and efficient use of data
and information by analysts and
managers. A “data warehouse” is a
repository of historical data that are
organized by subject to support
decision makers in the organization.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
The concept of data warehousing
dates back to the late 1980s when IBM
researchers Barry Devlin and Paul
Murphy developed the business data
warehouse
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
Key developments in early years
of data warehousing were:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
* 1960s— General Mills and
Dartmouth College, in a joint research
project, develop the terms dimensions
and facts.Kimball 2002, pg. 16
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
* 1970s— Bill Inmon begins to
define and discuss the term: Data
Warehouse
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
* 1975— Sperry Univac Introduce
MAPPER (MAintain, Prepare, and
Produce Executive Reports) is a
database management and reporting
system that includes the world's first
4GL. It was the first platform designed
for building Information Centers (a
forerunner of contemporary
Enterprise Data Warehousing
platforms)
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
* 1983— Teradata introduces a database
management system specifically designed
for decision support.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
* 1983— Sperry Corporation Martyn
Richard Jones defines the Sperry
Information Center approach, which
while not being a true DW in the
Inmon sense, did contain many of the
characteristics of DW structures and
process as defined previously by
Inmon, and later by Devlin. First used
at the Trustee Savings Bank|TSB
England Wales
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
* 1984— Metaphor Computer Systems,
founded by David Liddle and Don
Massaro, releases Data Interpretation
System (DIS). DIS was a
hardware/software package and GUI for
business users to create a database
management and analytic system.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
* 1988— Barry Devlin and Paul Murphy
publish the article [
http://ieeexplore.ieee.org/stamp/stamp.jsp
?tp=arnumber=5387658 An architecture
for a business and information system] in
IBM Systems Journal where they
introduce the term business data
warehouse.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
* 1990— Red Brick Systems, founded
by Ralph Kimball, introduces Red
Brick Warehouse, a database
management system specifically for
data warehousing.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
* 1991— Prism Solutions, founded by
Bill Inmon, introduces Prism
Warehouse Manager, software for
developing a data warehouse.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
* 1992— Bill Inmon publishes
the book Building the Data
Warehouse.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
* 1995— The Data Warehousing
Institute, a for-profit organization that
promotes data warehousing, is
founded.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
* 1996— Ralph Kimball
publishes the book The Data
Warehouse Toolkit.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - History
1
* 2000— Daniel Linstedt releases the
Data Vault, enabling real time auditable
Data Warehouses warehouse.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Facts
1
A fact is a value or measurement, which represents
a fact about the managed entity or system.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Facts
1
E.g. if a BTS received 1,000 requests for
traffic channel allocation, it allocates for
820 and rejects the remaining then it
would report 3 'facts' or measurements to
a management system:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Facts
1
Facts at raw level are further aggregated
to higher levels in various Dimension (data
warehouse)|dimensions to extract more
service or business-relevant information
out of it. These are called aggregates or
summaries or aggregated facts.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
There are three or more leading
approaches to storing data in a data
warehouse— the most important
approaches are the dimensional approach
and the normalized approach.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
The dimensional approach, whose
supporters are referred to as “Kimballites”,
believe in Ralph Kimball’s approach in
which it is stated that the data warehouse
should be modeled using a Dimensional
Model/star schema. The normalized
approach, also called the 3NF model
(Third Normal Form), whose supporters
are referred to as “Inmonites”, believe in
Bill Inmon's approach in which it is stated
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
In a Star schema|dimensional
approach, transaction data are
partitioned into facts, which are
generally numeric transaction data,
and dimension (data
warehouse)|dimensions, which are
the reference information that gives
context to the facts
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
1
Also, the retrieval of data from the data
warehouse tends to operate very quickly
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of data
1
# In order to maintain the integrity of
facts and dimensions, loading the data
warehouse with data from different
operational systems is complicated,
and
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
1
# It is difficult to modify the data
warehouse structure if the
organization adopting the
dimensional approach changes the
way in which it does business.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
1
In the normalized approach, the data in
the data warehouse are stored following,
to a degree, database normalization rules
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
1
The main advantage of this approach is
that it is straightforward to add
information into the database. A
disadvantage of this approach is that,
because of the number of tables
involved, it can be difficult for users
both to:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
1
# join data from different
sources into meaningful
information and then
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of data
1
# access the information without a precise
understanding of the sources of data and
of the data structure of the data
warehouse.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
It should be noted that both normalized
and dimensional models can be
represented in entity-relationship diagrams
as both contain joined relational tables.
The difference between the two models is
the degree of normalization.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
1
These approaches are not mutually
exclusive, and there are other
approaches. Dimensional approaches
can involve normalizing data to a
degree (Kimball, Ralph 2008).
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Dimensional vs. normalized approach for storage of
data
1
In Information-Driven Business, Robert
Hillard proposes an approach to
comparing the two approaches based
on the information needs of the
business problem
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Bottom-up design
1
Ralph Kimball,Kimball 2002, pg. 310
designed an approach to data
warehouse design known as bottomup.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Bottom-up design
1
In the bottom-up approach, data marts are
first created to provide reporting and
analytical capabilities for specific business
processes.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Bottom-up design
1
The data warehouse bus architecture is
primarily an implementation of the bus,
a collection of Dimension (data
warehouse)#Types|conformed
dimensions and Facts (data
warehouse)#Types|conformed facts,
which are dimensions that are shared
(in a specific way) between facts in two
or more data marts.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Bottom-up design
1
The integration of the data marts in the
data warehouse is centered on the
conformed dimensions (residing in the
bus) that define the possible integration
points between data marts
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Bottom-up design
1
Maintaining tight management over the
data warehouse bus architecture is
fundamental to maintaining the
integrity of the data warehouse. The
most important management task is
making sure dimensions among data
marts are consistent.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Bottom-up design
1
Business value can be returned as quickly
as the first data marts can be created, and
the method lends itself well to an
exploratory and iterative approach to
building data warehouses
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Bottom-up design
1
If integration via the bus is achieved,
the data warehouse, through its two
data marts, will not only be able to
deliver the specific information that the
individual data marts are designed to
do, in this example either Sales or
Production information, but can deliver
integrated Sales-Production
information, which, often, is of critical
business value.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Top-down design
Gartner released a research note
confirming Inmon's definition in
2005Gartner, Of Data Warehouses,
Operational Data Stores, Data Marts
and Data Outhouses, Dec 2005 with
additional clarity plus they added one
additional attribute
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Top-down design
1
; Subject-oriented: The data in the data
warehouse is organized so that all the
data elements relating to the same
real-world event or object are linked
together.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Top-down design
; Non-volatile: Data in the data
warehouse are never over-written or
deleted— once committed, the data are
static, read-only, and retained for future
reporting.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Top-down design
; Integrated: The data warehouse
contains data from most or all of an
organization's operational systems and
these data are made consistent.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Top-down design
1
; Time-variant: For an 'operational system',
the stored data contains the current value.
The data warehouse, however, contains
the history of data values.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Top-down design
1
; No virtualization: A data warehouse is a
physical repository.Gartner, Of Data
Warehouses, Operational Data Stores,
Data Marts and Data Outhouses, Dec
2005
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Top-down design
1
The up-front cost for implementing a data
warehouse using the top-down
methodology is significant, and the
duration of time from the start of project to
the point that end users experience initial
benefits can be substantial
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Hybrid design
1
Data warehouse (DW)
solutions often
resemble the hub and
spokes architecture
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Hybrid design
It is important to note that the DW
database in a hybrid solution is kept on
third normal form to eliminate data
redundancy
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Hybrid design
1
The Data Vault model is
geared to be strictly a
data warehouse
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Data warehouses versus operational systems
1
Operational systems are optimized for
preservation of data integrity and
speed of recording of business
transactions through use of database
normalization and an entity-relationship
model
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Evolution in organization use
1
These terms refer to
the level of
sophistication of a
data warehouse:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Evolution in organization use
1
; Offline operational data warehouse: Data
warehouses in this stage of evolution are
updated on a regular time cycle (usually
daily, weekly or monthly) from the
operational systems and the data is stored
in an integrated reporting-oriented data
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Evolution in organization use
; Offline data warehouse: Data
warehouses at this stage are updated from
data in the operational systems on a
regular basis and the data warehouse data
are stored in a data structure designed to
facilitate reporting.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Evolution in organization use
; On time data warehouse: Online
Integrated Data Warehousing represent
the real time Data warehouses stage data
in the warehouse is updated for every
transaction performed on the source data
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Fact (data warehouse) - Evolution in organization use
1
; Integrated data warehouse: These data
warehouses assemble data from different
areas of business, so users can look up
the information they need across other
systems.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouses
In computing, a 'data warehouse'
('DW', 'DWH'), or an 'enterprise data
warehouse' ('EDW'), is a database
used for Business reporting|reporting
(1) and data analysis (2). Integrating
data from one or more disparate
sources creates a central repository of
data, a data warehouse (DW). Data
warehouses store current as well as
historical data and are used for
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouses
1
The data stored in the warehouse are
Uploading and downloading|uploaded
from the operational systems (such as
marketing, sales, etc., shown in the
figure to the right). The data may pass
through an operational data store for
additional operations before they are
used in the DW for reporting.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Data warehouses - Benefits of a data warehouse
1
*Making decision–support
queries easier to write.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse)
In a data warehouse, 'Dimensions'
provide structured labeling
information to otherwise unordered
numeric measures. The dimension is
a data set composed of individual,
non-overlapping data elements. The
primary functions of dimensions are
threefold: to provide filtering,
grouping and labeling.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse)
1
A common data warehouse example
involves sales as the measure, with
customer and product as dimensions.
In each sale a customer buys a
product. The data can be sliced by
removing all customers except for a
group under study, and then diced by
grouping by product.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse)
Typically dimensions in a data
warehouse are organized internally into
one or more hierarchies. Date is a
common dimension, with several
possible hierarchies:
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse)
*Days (are grouped into)
Months (which are grouped
into) Years,
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse)
1
*Days (are grouped
into) Weeks (which
are grouped into)
Years
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Conformed dimension
1
Dimensions are conformed when they
are either exactly the same (including
keys) or one is a perfect subset of the
other. Most important, the row
headers produced in two different
answer sets from the same conformed
dimension(s) must be able to match
perfectly.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Conformed dimension
1
The date dimension table connected to
the sales facts is identical to the date
dimension connected to the inventory
facts.Ralph Kimball, Margy Ross, The
Data Warehouse Toolkit: The Complete
Guide to Dimensional Modeling, Second
Edition, Wiley Computer Publishing, 2002
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Junk dimension
1
A junk dimension is a convenient grouping
of typically low-cardinality flags and
indicators. By creating an abstract
dimension, these flags and indicators are
removed from the fact table while placing
them into a useful dimensional
framework.Ralph Kimball, Margy Ross,
The Data Warehouse Toolkit: The
Complete Guide to Dimensional Modeling,
Second Edition, Wiley Computer
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Degenerate dimension
1
Degenerate dimensions often play an
integral role in the fact table's
primary key.Ralph Kimball, Margy
Ross, The Data Warehouse Toolkit: The
Complete Guide to Dimensional
Modeling, Second Edition, Wiley
Computer Publishing, 2002
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Role-playing dimension
1
Dimensions are often recycled for multiple
applications within the same database.
For instance, a Date dimension can be
used for Date of Sale, as well as Date of
Delivery, or Date of Hire. This is often
referred to as a role-playing dimension.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Use of ISO representation terms
1
When referencing data from a metadata
registry such as ISO/IEC 11179,
representation terms such as 'Indicator' (a
boolean true/false value), 'Code' (a set of
non-overlapping enumerated values) are
typically used as dimensions. For
example using the National Information
Exchange Model (NIEM) the data element
name would be 'PersonGenderCode' and
the enumerated values would be 'male',
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Dimension (data warehouse) - Common patterns
1
;Date and timeRalph Kimball, The Data
Warehouse Toolkit, Second Edition, Wiley
Publishing, Inc., 2008. ISBN 978-0-47014977-5, Pages 253-256
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Measure (data warehouse)
In a data warehouse, a 'measure' is a
property on which calculations (e.g., sum,
count, average, minimum, maximum) can
be made.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Operational database - Data warehouse terminology
1
In Data warehouse|data warehousing,
the term is even more specific: the
operational database is the one which
is accessed by an operational system
(for example a customer-facing website
or the application used by the customer
service department) to carry out regular
operations of an organization.
Operational databases usually use an
online transaction processing database
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Operational database - Data warehouse terminology
The contents of the data warehouse
are deleted from the operational
database, but not necessarily updated
in real time (if the machines are
separate). Data warehouses tend to be
optimized for faster read-only queries
as an online analytical processing
database to be used for back-office
applications like business
intelligence.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Real-time business intelligence - Data warehouse
1
An alternative approach to event driven
architectures is to increase the refresh
cycle of an existing data warehouse to
update the data more frequently. These
real-time data warehouse systems can
achieve near real-time update of data,
where the data latency typically is in the
range from minutes to hours. The analysis
of the data is still usually manual, so the
total latency is significantly different from
event driven architectural approaches.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
The generic structure, compared to the
traditional data warehouse design
based on third normal form schemas
and Snowflake schema|snowflake or
star schemas, has both advantages and
disadvantages.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
* The generic structure can store Time
variance|time variant business context
data (i.e., changes to the business context
data that happen over time such as a
reorganization where departments are
grouped differently), without requiring any
database design changes
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
* The generic structure presents a highly
standardized approach to loading and
retrieval, enabling the automatic creation
of loading and retrieval routines by Kalido
DIW.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
* The generic structure enables the
loading of new classes of data through
the simple addition of a few records of
metadata. Conventionally, changes in
requirements cause changes to the
design, requiring a database
administrator to alter the table
structure of the warehouse and to
reorganize the data in the database.
The costs and time involved can be
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
* The generic structure allows the
capture of complex business rules
that are difficult to capture using a
conventional relational structure.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
* The use of Meta data|metadata allows
the structure of business context and
transaction data to be easily understood
by business users.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
A pure implementation of generic modeling
principles will bring with it some
disadvantages such as:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
* Conventional star schema can give
better performance than physical
implementations of the generic
structure. Kalido DIW addresses these
issues by combining elements of the
generic structure with those of a star
schema.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
* The generic structure supports the
business structure by holding multiple
rows, linked by pointers, instead of the
conventional columns in a table. This
makes the data difficult to read and the
SQL difficult to write, requiring a
codegenerating front-end to read and load
data. Kalido DIW has such a codegenerating front-end.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
Despite the generic structure being
different from conventional designs, it is
far easier to query once understood as it
combines the business metadata
dictionary with the business context data.
Finding out where something is stored is
far simpler than navigating through
hundreds of obscure tables.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
Given the above advantages and
disadvantages, a mix of the generic design
for business context data and the star
schema for transaction data and retrieval
would make an ideal situation. This has
been the basis for the physical
implementation of Kalido
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
DIW. The results of the Kalido
implementation have proved that this
innovative design can, and does, work.
Kalido has UK patents on this design. The
generic design of Kalido DIW is highly
flexible but could have made processing
transactions against the hierarchies of
business context data it rather inefficient.
To improve performance, the complex
hierarchies
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
are automatically
flattened out by Kalido
DIW to create mapping
tables.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
These mapping tables are complex and
contain the full structure of the business
context data hierarchies, including the
date and time stamping of changes
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
The creation of mapping tables makes a
Kalido warehouse appear like any other
star schema. Conventional star schemas
include the business context data, but they
are keyed reference tables with all the
attributes, classifications, etc. as columns.
This causes duplication of data and
difficulty in maintenance, but is fast to
process. This is why the Kalido warehouse
can equal the query
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
performance of a conventional design.
The creation of the mapping tables can
be a scheduled task or the user can
initiate it. Batch tasks can also be used
for business context data loading,
transaction loading, summary
generation, mapping table generation,
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
data mart building, or
export of transaction or
business context data.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
Data marts are generated by extracting
information from the warehouse in a form
that can be analyzed using tools such as
Excel or
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
BusinessObjects to slice and dice, or
drill-down through it. The data mart can
be separated from the database, and
small ones can take the form of Excel
pivot tables, which can be taken away
on a portable computer for offline
analysis.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
In summary, one of the requirements
of a data warehouse is that it should be
capable of storing and managing
almost any data from any source.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
In a Kalido warehouse:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
* Information is held in a neutral format, i.e. not
limited to a particular type of business data.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
* There are neutral formats for
transaction data and business
context data.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
Metadata is used for:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
* validation and loading of
data into the warehouse
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
* structuring data in the
warehouse
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Kalido - Generic Modeling and the Data Warehouse
1
The neutral formats allow you to select and view
information as you want in data marts.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
College football national championships in NCAA Division I FBS - College Football Data
Warehouse recognized national champions
These include the National Championship
Foundation (1869–1882), the Helms Athletic
Foundation (1883–1935), the College Football
Researchers Association (1919–1935), the
Associated Press Poll (1936–present), and the
Coaches Poll (1950–
present).[http://www.cfbdatawarehouse.com/da
ta/national_championships/index.php College
Football Data Warehouse: National
Championships, accessdate=2009-01-30] From
its research, it has compiled a list of Recognized
National Championships for each
season.[http://www.cfbdatawarehouse.com/data
/national_championships/year_by_year.php
College Football Data Warehouse: National
Championships by Year,
accessdate=2014-01-07]
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Some years include recognition of multiple
1
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
1
The banking and financial markets industry is
tackling three core challenges head on
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
IBM Banking and Financial Markets
Data Warehouse typically support
approximately 80% of business
requirements and can be easily
customized and extended to cover the
specific requirements of a financial
institution. They assist a financial
institution in implementing a
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
1
flexible, reusable, extensible and
easily customizable architecture,
which enables organizations to:
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
1
# Increase adaptivity and faster
response to changing customer
needs
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
1
# Accelerated Time to Value in the modeling, design
and deployment phase of a project
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
# Proven design templates
reduce project time and costs
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
# Strengthen
Business/Technolog
y Linkage
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
# Focus on achieving
competitive differentiation
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking and Financial Markets Data Warehouse
(BFMDW)
1
The BFMDW has a wealth of content,
for example, the product contains
more than 140 Analytical
Requirements covering seven
business focus areas.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Banking Data Warehouse (BDW)
1
The BDW is a derivative of the BFMDW and
contains content only relevant to the banking
industry.
https://store.theartofservice.com/the-data-warehouse-toolkit.html
Information Framework - IBM Financial Markets Data Warehouse (FMDW)
The FMDW is a derivative of the
BFMDW and contains only content
relevant to the Financial
markets|financial markets industry.
1
https://store.theartofservice.com/the-data-warehouse-toolkit.html
For More Information, Visit:
• https://store.theartofservice.co
m/the-data-warehousetoolkit.html
The Art of Service
https://store.theartofservice.com