Download Fact table - WordPress.com

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Business intelligence wikipedia , lookup

SQL wikipedia , lookup

PL/SQL wikipedia , lookup

Database wikipedia , lookup

Clusterpoint wikipedia , lookup

Rainbow table wikipedia , lookup

Versant Object Database wikipedia , lookup

Data vault modeling wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
DataWarehouse Architecture
The main difference between the database architecture in a standard, on-line transaction processing oriented
system (usually ERP or CRM system) and a DataWarehouse is that the system’s relational model is usually
de-normalized into dimension and fact tables which are typical to a data warehouse database design.
The differences in the database architectures are caused by different purposes of their existence.
In a typical OLTP system the database performance is crucial, as end-user interface responsiveness is
one of the most important factors determining usefulness of the application. That kind of a database needs to
handle inserting thousands of new records every hour. To achieve this usually the database is optimized for
speed of Inserts, Updates and Deletes and for holding as few records as possible. So from a technical point
of view most of the SQL queries issued will be INSERT, UPDATE and DELETE.
Opposite to OLTP systems, a DataWarehouse is a system that should give response to almost any
question regarding company performance measure. Usually the information delivered from a data
warehouse is used by people who are in charge of making decisions. So the information should be accessible
quickly but the thing is that the data doesn’t have to be the freshest possible and in a lowest detail level –
usually a data warehouse is refreshed on a daily basis and very often overnight.
The main challenge of DataWarehouse architecture is to store historical, summarized data with a read-only
access of the end-users. Again, from a technical standpoint the most SQL queries would start with a
SELECT statement.
In Data Warehouse environments, the relational model can be transformed into the following architectures:



Star schema
Snowflake schema
Constellation schema
Star schema architecture
Star schema architecture is the simplest data warehouse design. The main feature of a star schema is a table
at the center, called the fact table and the dimension tables which allow browsing of specific categories,
summarizing, drill-downs and specifying criteria.
Typically, most of the fact tables in a star schema are in database third normal form, while dimensional
tables are de-normalized (second normal form).
Fact table
The fact table is not a typical relational database table as it is de-normalized on purpose - to enhance query
response times. The fact table typically contains records that are ready to explore, usually with ad hoc
queries. Records in the fact table are often referred to as events, due to the time-variant nature of a data
warehouse environment.
The primary key for the fact table is a composite of all the columns except numeric values / scores (like
QUANTITY, TURNOVER, exact invoice date and time).
Typical fact tables in a global enterprise data warehouse are (apart for those, there may be some company or
business specific fact tables):
 sales fact table - contains all details regarding sales
 orders fact table - in some cases the table can be split into open orders and historical orders. Sometimes
the values for historical orders are stored in a sales fact table.
 budget fact table - usually grouped by month and loaded once at the end of a year.
 forecast fact table - usually grouped by month and loaded daily, weekly or monthly.
 inventory fact table - report stocks, usually refreshed daily
Dimension table
Nearly all of the information in a typical fact table is also present in one or more dimension tables. The main
purpose of maintaining Dimension Tables is to allow browsing the categories quickly and easily.
The primary keys of each of the dimension tables are linked together to form the composite primary key of
the fact table. In a star schema design, there is only one de-normalized table for a given dimension.
Typical dimension tables in a data warehouse are:
 time dimension table
 customers dimension table
 products dimension table
 key account managers (KAM) dimension table
 sales office dimension table
Star schema example
An example of a star schema architecture is depicted below.
Snowflake Schema architecture
Snowflake schema architecture is a more complex variation of a star schema design. The main difference is
that dimensional tables in a snowflake schema are normalized, so they have a typical relational database
design.
Snowflake schemas are generally used when a dimensional table becomes very big and when a star schema
can’t represent the complexity of a data structure. For example if a PRODUCT dimension table contains
millions of rows, the use of snowflake schemas should significantly improve performance by moving out
some data to other table (with BRANDS for instance).
The problem is that the more normalized the dimension table is, the more complicated SQL joins must be
issued to query them. This is because in order for a query to be answered, many tables need to be joined and
aggregates generated.
An example of a snowflake schema architecture is depicted below.
Fact constellation schema architecture
For each star schema or snowflake schema it is possible to construct a fact constellation schema.
This schema is more complex than star or snowflake architecture, which is because it contains multiple fact
tables. This allows dimension tables to be shared amongst many fact tables.
That solution is very flexible, however it may be hard to manage and support.
The main disadvantage of the fact constellation schema is a more complicated design because many variants
of aggregation must be considered.
In a fact constellation schema, different fact tables are explicitly assigned to the dimensions, which are for
given facts relevant. This may be useful in cases when some facts are associated with a given dimension
level and other facts with a deeper dimension level.
Use of that model should be reasonable when for example, there is a sales fact table (with details down to the
exact date and invoice header id) and a fact table with sales forecast which is calculated based on month,
client id and product id.
In that case using two different fact tables on a different level of grouping is realized through a fact
constellation model.
An example of a constellation schema architecture is depicted below.