Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
DataWarehouse Architecture The main difference between the database architecture in a standard, on-line transaction processing oriented system (usually ERP or CRM system) and a DataWarehouse is that the system’s relational model is usually de-normalized into dimension and fact tables which are typical to a data warehouse database design. The differences in the database architectures are caused by different purposes of their existence. In a typical OLTP system the database performance is crucial, as end-user interface responsiveness is one of the most important factors determining usefulness of the application. That kind of a database needs to handle inserting thousands of new records every hour. To achieve this usually the database is optimized for speed of Inserts, Updates and Deletes and for holding as few records as possible. So from a technical point of view most of the SQL queries issued will be INSERT, UPDATE and DELETE. Opposite to OLTP systems, a DataWarehouse is a system that should give response to almost any question regarding company performance measure. Usually the information delivered from a data warehouse is used by people who are in charge of making decisions. So the information should be accessible quickly but the thing is that the data doesn’t have to be the freshest possible and in a lowest detail level – usually a data warehouse is refreshed on a daily basis and very often overnight. The main challenge of DataWarehouse architecture is to store historical, summarized data with a read-only access of the end-users. Again, from a technical standpoint the most SQL queries would start with a SELECT statement. In Data Warehouse environments, the relational model can be transformed into the following architectures: Star schema Snowflake schema Constellation schema Star schema architecture Star schema architecture is the simplest data warehouse design. The main feature of a star schema is a table at the center, called the fact table and the dimension tables which allow browsing of specific categories, summarizing, drill-downs and specifying criteria. Typically, most of the fact tables in a star schema are in database third normal form, while dimensional tables are de-normalized (second normal form). Fact table The fact table is not a typical relational database table as it is de-normalized on purpose - to enhance query response times. The fact table typically contains records that are ready to explore, usually with ad hoc queries. Records in the fact table are often referred to as events, due to the time-variant nature of a data warehouse environment. The primary key for the fact table is a composite of all the columns except numeric values / scores (like QUANTITY, TURNOVER, exact invoice date and time). Typical fact tables in a global enterprise data warehouse are (apart for those, there may be some company or business specific fact tables): sales fact table - contains all details regarding sales orders fact table - in some cases the table can be split into open orders and historical orders. Sometimes the values for historical orders are stored in a sales fact table. budget fact table - usually grouped by month and loaded once at the end of a year. forecast fact table - usually grouped by month and loaded daily, weekly or monthly. inventory fact table - report stocks, usually refreshed daily Dimension table Nearly all of the information in a typical fact table is also present in one or more dimension tables. The main purpose of maintaining Dimension Tables is to allow browsing the categories quickly and easily. The primary keys of each of the dimension tables are linked together to form the composite primary key of the fact table. In a star schema design, there is only one de-normalized table for a given dimension. Typical dimension tables in a data warehouse are: time dimension table customers dimension table products dimension table key account managers (KAM) dimension table sales office dimension table Star schema example An example of a star schema architecture is depicted below. Snowflake Schema architecture Snowflake schema architecture is a more complex variation of a star schema design. The main difference is that dimensional tables in a snowflake schema are normalized, so they have a typical relational database design. Snowflake schemas are generally used when a dimensional table becomes very big and when a star schema can’t represent the complexity of a data structure. For example if a PRODUCT dimension table contains millions of rows, the use of snowflake schemas should significantly improve performance by moving out some data to other table (with BRANDS for instance). The problem is that the more normalized the dimension table is, the more complicated SQL joins must be issued to query them. This is because in order for a query to be answered, many tables need to be joined and aggregates generated. An example of a snowflake schema architecture is depicted below. Fact constellation schema architecture For each star schema or snowflake schema it is possible to construct a fact constellation schema. This schema is more complex than star or snowflake architecture, which is because it contains multiple fact tables. This allows dimension tables to be shared amongst many fact tables. That solution is very flexible, however it may be hard to manage and support. The main disadvantage of the fact constellation schema is a more complicated design because many variants of aggregation must be considered. In a fact constellation schema, different fact tables are explicitly assigned to the dimensions, which are for given facts relevant. This may be useful in cases when some facts are associated with a given dimension level and other facts with a deeper dimension level. Use of that model should be reasonable when for example, there is a sales fact table (with details down to the exact date and invoice header id) and a fact table with sales forecast which is calculated based on month, client id and product id. In that case using two different fact tables on a different level of grouping is realized through a fact constellation model. An example of a constellation schema architecture is depicted below.