Download Introduction To Business Intelligence

Document related concepts

Microsoft Jet Database Engine wikipedia , lookup

Database wikipedia , lookup

Open Database Connectivity wikipedia , lookup

SQL wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Introduction to
Business Intelligence
Prem Shanker
Sr. Software Engineer
Credit Suisse
Goals
• Learn about the concept of Data Warehousing and what
BIDS offer.
• Learn about how to design and implement a Data
Warehouse Dimensional database.
• Learn about what is a cube.
• Learn about the SQL Server Analysis Services Architecture
• Learn what is new in Analysis Services 2008
• Learn about what is a MDX Language.
What BIDS can do?
Cubes
Source
Systems/OLTP
SQL Server
Data
Warehouse
1
Design the
Data Warehouse
2
Clients
Analysis
Services
Populate
Data Warehouse
3
Query Tools
Reporting
Analysis
Create
OLAP Cubes
4
Query
Data
Data Warehouse
• Table and Cube
• Star Schema and Snowflake Schema
• Fact Table and Dimension Table
Table vs Cube
 A simplified example:
A typical relational table
Make it into a cube
Data are organized by rows
Data are organized by intersections
Sales table
Region dim
Product
Region
Sales $
Donut
East
1
Donut
West
2
Milk
Milk
East
West
East
West
Total
Donut
1
2
3
Milk
3
4
7
Total
4
6
10
3
4
Product dim
The basic ingredients to
make a cube
• Two kinds of table in a data warehouse DB
1. fact table
2. dimension tables.
• Question:
1. Which one is a fact table and which one is a dimension
table?
Star Schema
•
•
A Star Schema contains a fact table and one or more
dimension tables.
1. A Fact Table: The central fact table store the
numeric fact (measures) such as Sales dollars,
Costs, Unit Sales etc.
2. Dimension Tables: They surround the central fact
table, and they store descriptive information about
the measures
The shape looks like a Star
Star schema
Snowflake Schema
Review: Data Warehouse
Schemas
– The Data Warehouse is either a Star Schema or a
Snowflake Schema:
• Fact tables that contain foreign keys and numeric measures
• Dimension table contains the data describes the measures.
• The schema is ready for Analysis Services to build a cube.
Client Server Architecture
Excel
OLEDB
ADOMD
TCP
Client
Apps
Analysis
Server
XMLA
IIS
MOSS
AMO
HTTP
BIDS
SSMS
ADOMD
.NET
SSRS
A Logical Cube - Example
Product
Donut
Sandwich
The
Sales$ by
Soda by
West in Yr
of 2001
Region
Milk
North
South
Soda
East
West
Beer
1999 2000 2001 2002
Time
Tools to connect to
Cubes
•
•
•
•
SQL Server Management Studio (SSMS)
Business Intelligence Development Studio (BIDS)
Query Analyzer (SSMS) – To write MDX
Excel 2007 – Uses MDX
Physical Cube- BIDS
•
•
•
•
•
•
Analysis Services Database
Unified Dimensional Model
Data Source connection
Data Source View
Dimensions
Cube Creation Wizard
Analysis Services Database
• An Analysis Services database is the top level container for
other dependent objects:
• A database includes
– Data Source
– Data Source View
– Cube
– Dimension
– Security Role
Creating an Analysis
Services Database
• You can use one of the following to create a new empty
database on an instance of SQL Server 2005 Analysis
Services.
– SQL Server Management Studio
– Business Intelligence Development Studio.
Unified Dimensional
Modeling
• Common Name: UDM
• New feature Since AS 2005
• Combine all Relational Sources in one single
environment.
• A single data model, called Unified
Dimensional Model (UDM) over one or more
physical data sources
Unified Dimensional
Model - Concept
• The user needs to understand the particulars of each
technology (e.g. the dialect of SQL used) to generate
reports.
• Within one single Analysis Services, you can have more
than one data sources to pull the data from.
Data Source Connection
• The data sources of your AS database is your Data
Warehouse databases (SQL).
• It defines the connection string and authentication
information for a database on an OLE DB data provider.
• You can use the Data Source Wizard to specify one or
more data sources (SQLDB) for Analysis Services
databases.
The Functions of the Data
Sources
• Integrate your Analysis Services databases with the data
warehouses
• They are used for the following:
– Processing the Cubes and dimensions
– Data Retrieval if ROLAP or HOLAP is used as the
storage.
– Write Back
Different Storage types of
Cube
Data Sources connection
to SQL Server
• For SQL Server, you can pick from the following providers:
– OLE DB provider for SQL Server
– SQL Native Client
– .NET Provider/SqlClient Data Provider
– (Avoid using .NET data sources – OLEDB is faster for
processing in practices)
Data Source Views
•
•
•
•
•
•
•
•
New feature Since AS 2005
A single unified view of the metadata (UDM) from specified tables and
views that the data source defines in the project.
It hides the physical implementation of the underlying data sources from the
reporting users.
Basic Data Layout for Cubes
Define Data Relationships
Can Leverage Multiple Data Sources
The key to effective cube design
Named Query As Objects – Not only Tables or Views
Demo
Dimension
•
•
•
•
All dimensions are based on tables or views in a data source view.
All dimensions are shared since AS 2005
The structure of a dimension is largely driven by the structure of the
underlying dimension table or tables.
The simplest structure is called a star schema, which is where each
dimension is based on a single dimension table that is directly linked to the
fact table by a primary key - foreign key relationship.
Dimension Consists of
• A dimension consists of:
– Attributes that describe the entity
– User-Defined Hierarchies that organize
dimension members in meaningful ways
• such as
Store Name  Store City  Store State  Store Country
Attributes
• New feature since AS 2005
• Containers of dimension members
• Typically have one-many relationships between attributes
in the same dimension:
– City State,
– State Country, etc.
– All attributes implicitly related to the key
User Defined Hierarchies
• User Defined Hierarchies are
created from Attributes
• Tree-like structure
City  State  Country  All
• Provide navigation paths in a cube
Typical Example – Calendar
Hierarchy
• The Year, Quarter, and Month attributes are used to
construct a hierarchy, named Calendar, in the time
dim.
• The relationship between the levels and members of
the Calendar dimension (a regular dimension) is
shown in the following diagram.
Measure Group
• In a cube, a measure is the set of values, usually numeric,
that are based on a column in the fact table in the cube.
• A measure group contains one or more or all the measures
from a single fact table. It can’t contain measures from
different fact table.
Measure Group
Advantages
• Measure groups provide the following advantages:
– They can be partitioned and processed separately
– They allows to include measures from diff fact tables.
– They are grouped by granularity: Same measure group
same granularity.
– Security can be applied to specific measure groups
Cube
• A cube is
defined by
its measures
and
dimensions.
Inside a Cube
•
•
•
•
•
•
Measures and Measure Groups
Dimensions Relationships
Calculations
Actions
Partitions
Perspectives
Demo
Dimension Design
•
Different Dimension Relationships
– Regular Dimension Relationship
– Reference Dimension Relationship
– Fact Dimension Relationship
– Role Playing Dimension
– Parent-Child Hierarchy
Regular Dimension
Relationships
•
•
A traditional star schema design
The Primary Key in the dimension table joins directly
to Foreign Key in the fact table.
Reference Dimension
Relationships
•
•
Snowflake schema
A Reference dimension using columns from multiple
tables, or the dimension table links a dimension that is
directly linked to the fact table.
Role Playing Dimension
•
•
It is used in a cube more than one time, each time for a
different purpose.
Each role-playing dimension is joined to a fact table on a
different foreign key.
Example, you might add a Time dimension to a cube three times to track
the times that
–
–
products are ordered,
products are shipped,
–
Orders are due..
Parent-Child Hierarchy
•
•
A parent-child hierarchy is a hierarchy in a standard
dimension that contains a parent attribute. A parent
attribute describes a self-join, within the same
dimension table.
Example: Employee Hierarchy
 An employee is an employee
who reports to his/her manager.
His manager is an employee as
well
Employee Key self joins to
ParentEmployeeKey
Slowly Changing Dimension
•
•
Some attribute values may change over time.
Two basic techniques:
– Type 1 change
– Type 2 change
Slowly Changing
Dimension – Type 1
•
A Type 1 change, is to simply overwrite the old value with the
new one.
Slowly Changing
Dimension – Type 2
•
You create a new dimension row with the new value
and a new surrogate key, and mark the old row or
timestamp as no longer in effect The fact table will use
the new surrogate key to link new fact measurements
Calculated Member
•
•
A Calculated Members is a member of a dimension or
a measure group that is defined based on a MDX
expression.
The value for the member is calculated at runtime.
The result values are not stored in the disk.
Calculated Member
Properties
Named Set
•
•
•
•
A named set is a MDX expression that returns a set of
dimension members.
You can define named sets and save them as part of
the cube definition.
It allows you to reuse the same named set throughout
the cube.
Typical example:
– Create a list Top 10 customers based on Sales
– You can reuse same Top 10 customers in diff
queries.
Best practices for Cube
Design
•
•
•
Use integer or numeric for key columns.
Avoid ROLAP storage mode, particular with custom
rollup or unary operators. MOLAP is the fastest storage
structure in SSAS.
Use parent-child dimensions prudently, especially
those containing custom rollup and unary operators.
No aggregation support in PC dimension.
Best practices for Cube
Design (Contd..)
•
Use role playing dimensions (e.g. OrderDate, BillDate,
ShipDate) - avoids multiple physical copies. If the
dimensions are base from the same physical table(s),
use role playing dimensions.
What's New (Analysis
Services - Multidimensional
Database)
• New Attribute Relationship designer. The dimension
editor has a new Attribute Relationship designer that
makes it easier to browse and modify attribute
relationships.
• New AMO Warnings. These new warning messages
alert users when they depart from design best practices
or make logical errors in database design.
What's New (Analysis
Services - Multidimensional
Database)
• Backup and Restore Improvements
• The backup and restore functionality in Analysis Services has a new
storage structure and enhanced performance in all backup and
restore scenarios.
• Improved Storage Structure
• The new storage structure provides a more robust repository for the
archived database. By using the new storage structure, there is no
practical limit to the size of the database file, nor is there a limit to
the number of files that a database can have.
• Improved Performance
• The new backup and restore functionality achieves increased
performance. Tests on different sized databases and with various
numbers of files have shown significant performance improvements.
What's New (Analysis
Services - Multidimensional
Database)
• Dynamic Management Views
• Monitoring Connections, Sessions, and Commands
Discover_Connections, Discover_Sessions, and
Discover_Commands.
• select * from $system.discover_connections
Fetching Data from Cube
•
•
•
What Is MDX
Testing MDX with the Query Tool in SQL Server
Management Studio
The Basic Elements of an MDX Query
What Is MDX
•
An Extension of SQL Syntax That:
– Queries and manipulates multidimensional data
in OLAP cubes
– Defines calculations based on information in the
cube
– Defines and populates local cubes
•
Not a True Extension –
– Syntax Deviates Significantly from SQL
Testing MDX with
Management Studio
Background
Select
on axis (x),
on axis (y),
on axis (z)
From [cubeName]
Every cell has a name...
Components
Clothing
Bikes
1997
Time
1998
1999
2000
2001
Measures
Every cell has a name...
(Products.Bikes, Measures.Units, Time.[2000])
Components
Clothing
Bikes
1997
Time
1998
1999
2000
2001
Measures
Every cell has a name...
(Products.Bikes, Measures.Units, Time.[2000])
(Products.Bikes, Measures.Sales, Time.[1999])
Components
Clothing
Bikes
1997
Time
1998
1999
2000
2001
Measures
A Cell is referenced by all the
dimensions
What if I only specify this?
(Products.Bikes, Measures.Units)
Components
Clothing
Bikes
1997
Time
1998
1999
2000
2001
Measures
Default Member
What if I only specify this?
(Products.Bikes, Measures.Units)
If Time’s default member is [1997]
Ans: (Products.Bikes, Measures.Units, Time.[1997])
Computer
Monitor
Printer
1997
Time
1998
1999
2000
2001
Measures
The Basic Elements of an
MDX Query
Select
{[Ship Date].[Calendar]} on columns,
{[Product].[Product Categories]} on rows
from [Adventure Works]
Using Braces { }
• Braces Denote a Set
• Braces Can be Omitted when the Set is Unambiguous.
• In SSAS 2005 / 2008:
•
SELECT
[Ship Date].[Calendar] ON COLUMNS,
[Product].[Product Categories] ON ROWS
FROM [Adventure Works]
In AS 2000:
SELECT
{[Ship Date].[Calendar]} ON COLUMNS,
{[Product].[Product Categories]} ON ROWS
FROM [Adventure Works]
Using Brackets [ ]
• Brackets Enclose a String Value
• Necessary for:
– Field names with spaces: [New York], [Mary Lo]
– Numbers as field names: [2007], [2008]
•
Otherwise, the SSAS will treat them as numerous constants
Default Members
• Every Dimension has a Default Member
– Usually the “All” member is the default member.
• Default Measures
– The measures dimension also has a default
measure
– In our sample cube [Adventure Works], the default
member for the cube is [Reseller Sales Amount]
Members
You want to query more than a single cell.
Use Members function
Members function returns the set of members in a dimension, level, or hierarchy.
select
[Ship Date].[Calendar] on columns,
[Product].[Product Categories].members on rows
from [Adventure Works]
Test Yourself: Number 1
[Ship Date].[Calendar] also has a membership; that is, it is made up of more
granular information. Modify the query to return the membership of the [Ship
Date].[Calendar]dimension.
select
[Ship Date].[Calendar] on columns,
[Product].[Product Categories].members on rows
from [Adventure Works]
Desired result:
Naming Additional
Dimensions
Number
Name
AXIS(0)
COLUMNS
AXIS(1)
ROWS
AXIS(2)
PAGES
AXIS(3)
SECTIONS
AXIS(4)
CHAPTERS
Retrieving Data from a
Cube
select
[Ship Date].[Calendar].[Calendar Year].[CY 2004] on axis(0),
[Promotion].[Promotions].[reseller] on axis(1)
from [Adventure Works]
[Promotion].[Promotions]
No Discunt
Reseller
2001
2002
2003
2004
[Ship Date].[Calendar]
Test Yourself: Number 2
• Modify the query to return the sales of Bikes with No
Discount
select
[Ship Date].[Calendar].[Calendar Year].[CY 2004] on axis(0),
[Promotion].[Promotions].[reseller] on axis(1)
from [Adventure Works]
Expect Result
Fully Qualified Names
• [CY 2001] below could be
– [Delivery Date].[Calendar].[CY 2001] or
– [Ship Date].[Calendar].[CY 2001]
select
[CY 2001] on axis(0)
from [Adventure Works]
•
[Product].[Product Categories].[bikes] is the same as
[Product].[Product Categories].[All Products].[bikes]
Two Dimensions with Where
Clause
select
[Ship Date].[Calendar].[Calendar Year].members on axis(0),
[Promotion].[Promotions].[reseller] on axis(1)
from [Adventure Works]
where [Product].[Product Categories].[bikes]
[Promotion].[Pr
omotions]
No Discount
Reseller
2001
2002 2003
Components
Clothing
[Product].
Bikes
[Product Categories]
2004
[Ship Date].[Calendar]
Demo
• Lab MDX Query
Few Useful References
• www.microsoft.com/sqlserver/2008/en/us/analysisservices.aspx
• All BI WebCasts http://www.microsoft.com/events/series/bi.aspx?tab=webc
asts&id=all
• MDX References –
msdn.microsoft.com/en-us/library/ms145506.aspx
Thank You
[email protected]