Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Data Modeling Last Updated : 26th may 2003 Center of Excellence Data Warehousing Objectives At the end of this lesson, you will know : Data Modeling for Data Warehouse What are dimensions and facts Star Schema and Snowflake Schemas Coverage Tables Factless Tables What to look for in Modeling tools Some modeling tools Data Modeling for Data Warehouse How to structure the data in your data warehouse ? Process that produces abstract data models for one or more database components of the data warehouse Modeling for Warehouse is different from that for Operational database Dimensional Modeling, Star Schema Modeling or Fact/Dimension Modeling Modeling Techniques Entity-Relationship Modeling Traditional modeling technique Technique of choice for OLTP Suited for corporate data warehouse Dimensional Modeling Analyzing business measures in the specific business context Helps visualize very abstract business questions End users can easily understand and navigate the data structure Entity-Relationship Modeling - Basic Concepts The ER modeling technique is a discipline used to illuminate the microscopic relationships among data elements. The highest art form of ER modeling is to remove all redundancy in the data. Created databases that cannot be queried !!!!! An Order Processing ER Model City FK Salesrep table FK Sales District Order Header FK Sales Region Order Details Sales Country Customer Table Item Table Product Brand Product Category Entity-Relationship Modeling - Basic Concepts Entity Object that can be observed and classified by its properties and characteristics Business definition with a clear boundary Characterized by a noun Example Product Employee Entity-Relationship Modeling - Basic Concepts Relationship Relationship between entities - structural interaction and association described by a verb Cardinality 1-1 1-M M-M Example : Books belong to Printed Media Entity-Relationship Modeling - Basic Concepts Attributes Characteristics and properties of entities Example : Book Id, Description, book category are attributes of entity “Book” Attribute name should be unique and self-explanatory Primary Key, Foreign Key, Constraints are defined on Attributes Entity-Relationship Modeling – Why Not ? End users cannot understand or remember an ER model. No graphical user interface (GUI) that takes a general ER model and makes it usable by end users. Softwares cannot usefully query a general ER model. Use of the ER modeling technique defeats the basic allure of data warehousing, namely intuitive and high-performance retrieval of data. Dimensional Modeling - Basic Concepts Represents the data in a standard, intuitive framework that allows for high-performance access; Schema designed to process large, complex, adhoc and data intensive queries. No concern for concurrency, locking and insert/update/delete performance Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. This characteristic "star-like" structure is often called a star join. Star Schema DISTRICT Dimension s STATE CITY REGION PRODUCT CITY PRODUCT BRAND PERIOD CATEGORY COLOR SIZE CUSTOMER SALES AMOUNT DAY UNITS ADDRESS MONTH CATEGORY QUARTER YEAR CUSTOMER Measures CONTACT Dimensional Modeling - Basic Concepts Fact Tables The most useful facts in a fact table are numeric and additive Typically represents a business transaction, or event that can be used in analyzing business process By nature fact tables are sparse Usually very large - billions of records Dimensional Modeling - Basic Concepts Dimension Tables Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table. Dimension tables, most often contain descriptive textual information Determine contextual background for facts Examples : Time Location/Region Customers Dimensional Modeling - Basic Concepts Measures A numeric attribute of a fact Represents performance or behavior of the business relative to the dimensions The actual numbers are called variables Occupy very little space compared to Fact Tables Examples : Quantity supplied Transaction amount Sales volume Fact Table & Dimension Tables Fact Tables Numerical Measurements of business are stored in Fact Tables. Dimensional Tables Dimensions are attributes about facts. Conformed Dimensions Dimension that means the same thing with every possible fact table that it can be joined with Conformed dimensions most essential For the Bus Architecture Integrated function of the Data Warehouse Some common dimensions are : Customer Product Location Time Surrogate Keys All tables (facts and dimensions) should not use production keys but Data Warehouse generated surrogate keys Productions keys get reused sometimes In case of mergers/acquisitions, protects you from different key formats Production systems may change their systems to generalize key definitions Using surrogate key will be faster Can handle Slowly Changing dimensions well Slowly Changing Dimensions Certain kinds of dimension attribute changes need to be handled differently in Data Warehouse Type I - Overwrite e.g. Name Correction, Description changes Type II - Partition History Packing change, Customer movement Create a new dimension record with new surrogate key Type III - Organizational changes Sales Force Reorganization Show by sales broken by new and old organizations Need to create an old and a new field Factless Fact Tables For Event Tracking e.g. attendance Date Dimension Course Dimension Facility Dimension Date_Key Student_Key Course_Key Teacher_Key Facility_Key Student Dimension Teacher Dimension Coverage Tables Problem : To find out which Products on promotion did not sell? Fact Table Date Dimension Store Dimension Date_Key Product_Key Store_Key Promotion_Key Dollars Sold Units Sold Product Dimension Promotion Dimension Coverage Tables Solution - Coverage Tables Date Dimension Date_Key Product_Key Store Dimension Product Dimension Store_Key Promotion_Key Sales Promotion Coverage Table Promotion Dimension Snowflake Schema Dimension tables are normalized by decomposing at the attribute level Each dimension has one key for each level of the dimension’s hierarchy Good performance when queries involve aggregation Complicated maintenance and metadata, explosion in number of table. Makes user representation more complex and intricate Snowflake schema - Example Dim Table Dim Table Fact Table Dim Table Dim Table Aggregates Pre-stored summaries in the database Significant Performance advantage Preferably should not be stored in fact tables. May take significant time to build aggregates Many tools can automatically navigate to most aggregated table that can service a query Aggregate Navigators Automatically redirect queries to the most summarized table Some tools like Business Objects, Discoverer, Microstrategy, Metacube etc support this Native database support already available LAN Aggregate Navigator DBMS Examples of Data Modeling Tools ERWIN Supports Data Warehouse design as a modeling technique Powersoft WarehouseArchitect Module of Power Designer specifically for DW Modeling Oracle Designer Can be extended for Warehouse modeling Others like Infomodeler, Silverrun are also used Questions