Download Fact Table Coverage Tables

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Product lifecycle wikipedia , lookup

Product planning wikipedia , lookup

Marketing mix modeling wikipedia , lookup

Predictive engineering analytics wikipedia , lookup

Transcript
Introduction
to
Data Modeling
Last Updated : 26th may 2003
Center of Excellence
Data Warehousing
Objectives
 At the end of this lesson, you will know :
 Data Modeling for Data Warehouse
 What are dimensions and facts
 Star Schema and Snowflake Schemas
 Coverage Tables
 Factless Tables
 What to look for in Modeling tools
 Some modeling tools
Data Modeling for Data Warehouse
 How to structure the data in your data warehouse
?
 Process that produces abstract data models for
one or more database components of the data
warehouse
 Modeling for Warehouse is different from that for
Operational database
 Dimensional Modeling, Star Schema Modeling or
Fact/Dimension Modeling
Modeling Techniques
 Entity-Relationship Modeling
 Traditional modeling technique
 Technique of choice for OLTP
 Suited for corporate data warehouse
 Dimensional Modeling
 Analyzing business measures in the specific business
context
 Helps visualize very abstract business questions
 End users can easily understand and navigate the data
structure
Entity-Relationship Modeling - Basic
Concepts
 The ER modeling technique is a discipline used to
illuminate the microscopic relationships among
data elements.
 The highest art form of ER modeling is to remove
all redundancy in the data.
 Created databases that cannot be queried !!!!!
An Order Processing ER Model
City
FK
Salesrep table
FK
Sales District
Order Header
FK
Sales Region
Order Details
Sales Country
Customer Table
Item Table
Product Brand
Product Category
Entity-Relationship Modeling - Basic
Concepts
 Entity
 Object that can be observed and classified by its
properties and characteristics
 Business definition with a clear boundary
 Characterized by a noun
 Example
 Product
 Employee
Entity-Relationship Modeling - Basic
Concepts
 Relationship
 Relationship between entities - structural interaction
and association
 described by a verb
 Cardinality
 1-1
 1-M
 M-M
 Example : Books belong to Printed Media
Entity-Relationship Modeling - Basic
Concepts
 Attributes
 Characteristics and properties of entities
 Example :
 Book Id, Description, book category are attributes of entity
“Book”
 Attribute name should be unique and self-explanatory
 Primary Key, Foreign Key, Constraints are defined on
Attributes
Entity-Relationship Modeling – Why
Not ?
 End users cannot understand or remember an ER
model.
 No graphical user interface (GUI) that takes a
general ER model and makes it usable by end
users.
 Softwares cannot usefully query a general ER
model.
 Use of the ER modeling technique defeats the
basic allure of data warehousing, namely intuitive
and high-performance retrieval of data.
Dimensional Modeling - Basic
Concepts
 Represents the data in a standard, intuitive
framework that allows for high-performance
access;
 Schema designed to process large, complex,
adhoc and data intensive queries.
 No concern for concurrency, locking and
insert/update/delete performance
 Every dimensional model is composed of one
table with a multipart key, called the fact table, and
a set of smaller tables called dimension tables.
 This characteristic "star-like" structure is often
called a star join.
Star Schema
DISTRICT
Dimension
s
STATE
CITY
REGION
PRODUCT
CITY
PRODUCT
BRAND
PERIOD
CATEGORY
COLOR
SIZE
CUSTOMER
SALES AMOUNT
DAY
UNITS
ADDRESS
MONTH
CATEGORY
QUARTER
YEAR
CUSTOMER
Measures
CONTACT
Dimensional Modeling - Basic
Concepts
 Fact Tables
 The most useful facts in a fact table are numeric and
additive
 Typically represents a business transaction, or event
that can be used in analyzing business process
 By nature fact tables are sparse
 Usually very large - billions of records
Dimensional Modeling - Basic
Concepts
 Dimension Tables
 Each dimension table has a single-part primary key that
corresponds exactly to one of the components of the
multipart key in the fact table.
 Dimension tables, most often contain descriptive textual
information
 Determine contextual background for facts
 Examples :
 Time
 Location/Region
 Customers
Dimensional Modeling - Basic
Concepts
 Measures
 A numeric attribute of a fact
 Represents performance or behavior of the business
relative to the dimensions
 The actual numbers are called variables
 Occupy very little space compared to Fact Tables
 Examples :
 Quantity supplied
 Transaction amount
 Sales volume
Fact Table & Dimension Tables
 Fact Tables
 Numerical Measurements
of business are stored in
Fact Tables.
 Dimensional Tables
 Dimensions are attributes
about facts.
Conformed Dimensions
 Dimension that means the same thing with every
possible fact table that it can be joined with
 Conformed dimensions most essential
 For the Bus Architecture
 Integrated function of the Data Warehouse
 Some common dimensions are :
 Customer
 Product
 Location
 Time
Surrogate Keys
 All tables (facts and dimensions) should not use
production keys but Data Warehouse generated
surrogate keys
 Productions keys get reused sometimes
 In case of mergers/acquisitions, protects you from
different key formats
 Production systems may change their systems to
generalize key definitions
 Using surrogate key will be faster
 Can handle Slowly Changing dimensions well
Slowly Changing Dimensions
Certain kinds of dimension attribute changes need
to be handled differently in Data Warehouse
 Type I - Overwrite
 e.g. Name Correction, Description changes
 Type II - Partition History
 Packing change, Customer movement
 Create a new dimension record with new surrogate key
 Type III - Organizational changes
 Sales Force Reorganization
 Show by sales broken by new and old organizations
 Need to create an old and a new field
Factless Fact Tables
 For Event Tracking e.g. attendance
Date
Dimension
Course
Dimension
Facility
Dimension
Date_Key
Student_Key
Course_Key
Teacher_Key
Facility_Key
Student
Dimension
Teacher
Dimension
Coverage Tables
 Problem : To find out which Products on
promotion
did not sell?
Fact Table
Date
Dimension
Store
Dimension
Date_Key
Product_Key
Store_Key
Promotion_Key
Dollars Sold
Units Sold
Product
Dimension
Promotion
Dimension
Coverage Tables
 Solution - Coverage Tables
Date
Dimension
Date_Key
Product_Key
Store
Dimension
Product
Dimension
Store_Key
Promotion_Key
Sales Promotion Coverage Table
Promotion
Dimension
Snowflake Schema
 Dimension tables are normalized by decomposing
at the attribute level
 Each dimension has one key for each level of the
dimension’s hierarchy
 Good performance when queries involve
aggregation
 Complicated maintenance and metadata,
explosion in number of table.
 Makes user representation more complex and
intricate
Snowflake schema - Example

Dim
Table
Dim
Table
Fact
Table
Dim
Table
Dim
Table
Aggregates
 Pre-stored summaries in the database
 Significant Performance advantage
 Preferably should not be stored in fact tables.
 May take significant time to build aggregates
 Many tools can automatically navigate to most
aggregated table that can service a query
Aggregate Navigators
 Automatically redirect queries to the most
summarized table
 Some tools like Business Objects, Discoverer,
Microstrategy, Metacube etc support this
 Native database support already available
LAN
Aggregate
Navigator
DBMS
Examples of Data Modeling Tools
 ERWIN
 Supports Data Warehouse design as a modeling
technique
 Powersoft WarehouseArchitect
 Module of Power Designer specifically for DW Modeling
 Oracle Designer
 Can be extended for Warehouse modeling
 Others like Infomodeler, Silverrun are also used
Questions