Download Multi-Dimensional OLAP

Document related concepts

Big data wikipedia , lookup

Microsoft Jet Database Engine wikipedia , lookup

Extensible Storage Engine wikipedia , lookup

SQL wikipedia , lookup

Database wikipedia , lookup

Entity–attribute–value model wikipedia , lookup

Open Database Connectivity wikipedia , lookup

Microsoft SQL Server wikipedia , lookup

Clusterpoint wikipedia , lookup

Relational model wikipedia , lookup

Database model wikipedia , lookup

Transcript
Data Warehouse Fundamentals
Chapter 8
OLAP IN THE DATA WAREHOUSE
Paul Chen
1
Summary of Topics
1.
2.
3.
OLAP Definition, Key features and Benefits
How OLAP differs from OATP?
Multidimensional data ? What? Why? How To Use?
4.
OLAP Server Query, Features and Applications.
5.
Category of OLAP tools -Multi-Dimensional OLAP (MOLAP),
Relational OLAP (ROLAP), and Managed Query Environment
(MQE).
OLAP extensions to SQL.
The Microsoft Data Warehousing Framework.
6.
7.
2
Data Warehousing and End-User Access Tools

Accompanying growth in data warehouses is
increasing demands for more powerful access tools
providing advanced analytical capabilities.

Key developments include:
– Online analytical processing (OLAP)
– SQL extensions for complex data analysis
– Data mining tools.
3
Limitations of Other Analysis Methods


SQL has been the accepted interface for retrieving and
manipulating data from relational databases. These methods are
used in OLTP systems and in data warehousing environments
(referring to the environments with simple queries and routine
reports).
Now consider information retrieval and manipulation in these
environments- reports writers and spreadsheets.
Report writers: two features- the ability to point and click for
generating and issuing SQL calls, and the capability to format the
output reports. However, report writers do not support
multidimensionality. With basic report writers, you cannot drill
down to lower levels in the dimensions. You cannot rotate the
results by switching rows and columns. The report writers do not
provide aggregate navigation. Once the report is formatted and
run, you cannot alter the presentation of the result data sets.
Spreadsheets are still cumbersome for showing all the aggregate
levels and multidimensional views, let alone doing calculations for
4
“roll-up” and “drill-down”.
Topic 1: OLAP Definition
“On-Line Analytical Processing (OLAP) is a category of software
technology that enables analysts, managers and executives to gain
insight into data through fast, consistent, interactive access to a
wide variety of possible views of information that has been
transformed from raw data to reflect the real dimensionality of
the enterprise as understood by the user”
-- DBMS Magazine, April, 1995
5
Key Features of OLAP

Supports analysis, dynamic synthesis and
consolidation of large volumes of multi-dimensional
data. Types of analysis ranges from basic navigation
and browsing (slicing and dicing) to calculations, to
more complex analyses such as time series and complex
modeling.

Is able to drill down or roll up with each dimension.

Is capable of applying mathematical formulas and
calculations to measures.
6
Key Features of OLAP


Can easily answer ‘who?’ and ‘what?’.
Ability to answer ‘what if?’ and ‘why?’ type questions.

Distinguishes OLAP from general-purpose query
tools.

Enables users to gain a deeper understanding and
knowledge about various aspects of their corporate
data through fast, consistent, interactive access to a
wide variety of possible views of the data.

Can be implemented on the web.
7
OLAP Benefits





Increased productivity of end-users.
Reduced backlog of applications development for IT
staff.
Retention of organizational control over the integrity
of corporate data.
Reduced query drag and network traffic on OLTP
systems or on the data warehouse.
Improved potential revenue and profitability.
8
Topic 2: OLAP VS OLTP (ON-LINE
TRANSACTION PROCESSING)

OLTP (RELATIONAL)




ATOMIZED
PRESENT
‘RECORD-AT-A-TIME’

PROCESS ORIENTED




OLAP
(MULTIDIMENSIONAL)
SUMMARIZED
HISTORICAL
‘MANY RECORDS-AT-ATIME’
SUBJECT ORIENTED
9
OLAP VS OLTP
WHILE OLTP APPLICATIONS GENERALLY DO
NOT REQUIRE HISTORICAL DATA, NEARLY
EVERY OLAP APPLICATION IS CONCERNED
WITH VIEWING TRENDS AND THEREFORE
REQUIRES HISTORICAL DATA. OLTP
APPLICATIONS AND DATABASE TEND TO BE
ORGANIZED AROUND SPECIFIC PROCESSES
(SUCH AS ORDER ENTRY), OLAP
APPLICATIONS TEND TO BE “SUBJECTORIENTED” ANSWERING SUCH QUESTIONS AS
“WHAT PRODUCTS ARE SELLING WELL” OR “
WHAT ARE MY WEAKEST SALES OFFICES?”
10
Topic 3: Multi-dimensional Database

In a multidimensional database, data is stored as
Facts and Dimensions instead of rows and columns

Multi-dimensional structures are best visualized as
cubes of data, and cubes within cubes of data. Each
side of a cube is a dimension.
11
Sample Star Schema
Sales Fact Table
Time
Time key
Date
Month
quarter
year
Product Key
Store Key
Time Key
Fixed Cost
Variable cost
Profit margin
YTD_Sales_dollars_by_store
YTD_Sales_dollar_by_category
YTD_Sales_By_department
Store
Store key
Store name
region
Store
Product
Month
Product
Product Key
Product Name
Category
Product line
12
Kinds of Queries



Display the total sales of all products for past five years
in all stores.
Compare total sales for all stores, product by product,
between years 2000 and 1999
Show comparison of total sales for all stores, product
by product, between years 2000 and 1999 only for
those products for reduced sales.
13
Cube

Users often view and analyze data multidimensionality,
using hierarchical segmentation along each dimension.
Thus a user may analyze sales along the time
dimension (such as months within quarters with years),
along geographical dimension (cities with regions
within countries), along the organizational dimension
(sales persons within branches within territories). We
can conceptualize the approach as a cube.
14
Cube
Fact Table View
Multi-Dimensional Cube
Property sale
sale p
Branch weekprice
p1 c1
p2 c2
p3 c3
p4 c1
1
2
2
1
1
2
4
3
week 2
week 1
p2
c1
2
p3
C1
P1
P4
c2 c3
1
3
4
C2
C3
P = property #
Cells: roughly equivalent to records in a relational
database.
15
WHAT IS MULTIDIMENSIONAL DATA?
RELATIONAL DATABASES ARE ORGANIZED AROUND A
LIST OF “RECORDS”. EACH RECORD CONTAINS
RELATED INFORMATION WHICH ARE ORGANIZED INTO
“FIELDS”.
CUS NAME
JACK
WALTER
HOOVER
BELLRED

CUSTOMER #
10001
TELEPHONE
345-4444
10002
10003
345-6666
345-8588
ADDRESS
40 MAIN
30 ELM
6
THIS TABLE HAS ONLY ONE DIMENSION.
16
WHAT IS MULTIDIMENSIONAL DATA?
LOOKING AT “CUSTOMER BY TELEPHONE #” OR
“TELEPHONE # BY CUSTOMER” ONLY PRODUCES
A ONE-FOR-ONE CORRESPONDENCE.
17
WHAT IS MULTIDIMENSIONAL DATA?
Let’s take a look at an example of a relational table
where
there is more than one-to-one correspondence
between
the fields.
In the following table, we have sales data for each
product
In each region-- four products sold in three regions.
The
Sales data is a two-dimensional matrix (“Product”
and
18
product/region/sales table
Product
Nuts
Nuts
Nuts
Screws
Screws
Screws
Bolts
Bolts
Bolts
Washers
Washers
Washers
Region
East
West
Central
East
West
Central
East
West
Central
East
West
Central
Sales
50
40
30
60
50
60
100
120
80
90
100
40
19
A Much Better Way To Represent the Data:
Two-Dimensional Matrix
PRODUCT
NUTS
SCREWS
BOLTS
WASHERS
EAST WEST CENTRAL
50
40
30
60
50
60
100
120
80
90
100
40
20
QUERY ON MULTIDIMENSIONAL
DATA
QUESTIONS LIKE ‘WHAT WERE TOTAL SALES OF NUTS?’
OR WHAT WERE TOTAL SALES FOR THE EAST?’ TO FIND
THE ANSWER IN THE TWO DIMENSIONAL TABLE, JUST
FIND THE CELL CALLED ‘EAST’ AND ADD UP ALL THE
NUMBERS IN THE COLUMN.
21
AGGREGATION: THE KEY TO
CONSISTENTLY FAST RESPONSE
PRODUCT
NUTS
SCREWS
BOLTS 100
WASHERS
TOTAL
EAST WEST CENTRAL
50
40
30
60
50
60
120
80
90
100
40
300
310
210
TOTAL
120
170
300
230
820
22
Multiple Reads & Database Writes
In the above example, computing the totals involves 28 (4*4
+3*4) database reads and eight database writes. A typical
relational database can read about 200 records per second
and writes perhaps 20 records per second. So consolidating
this tiny database would take less than one second.
However, for some larger tables, computing for totals
could take days or even weeks to consolidate.
23
Relational Database vs. Multidimensional
OLAP Server

With a multidimensional OLAP server, we can perform the same
consolidation with row and column arithmetic. Whereas a
relational database can access a few hundred records per second,
a good OLP server should be capable of consolidating 20,000 to
30,000 cells (equivalent to records in the relational table) per
second, including the time to write the totals to the database. The
multidimensional OLAP database will take up less space since the
names of the regions and products are not repeated in the
multidimensional database as they are in the relational database.
24
Multiple Hierarchies And Classes
Within Dimensions
 The
single biggest factor in determining how many
dimensions for a database is the existence of
multiple hierarchies and classes within dimensions.
 Classes are typically attributes such as “size”,
“color” and other characteristics that define a
subset of the members of a dimension.
 For example, a database of shampoo sales might
want to roll up product sales by size (6 oz, 15 oz), by
type(dry hair, oily hair) and possibly by other
attributes such as scented/unscented, brand name,
and so on.
25
TERMINOLOGY

Dimension: roughly equivalent to Fields in a relational database.
In the multidimensional data, “Product” and “Region” are both
dimensions.

Cells: roughly equivalent to Records in a relational
database.
26
Simple Hierarchies (Roll up) Within
Dimensions

The preceding table can be represented graphically as follows:
Region Total
East
Central
West
Product Total
nuts
Bolts
Screws
Individual products roll up into a Product Total
Washers
27
Multiple Levels of Hierarchies
Region Total
West
EAST
Calif
Central
Oregon
Washington
Seattle
Bellevue
28
Slice and Dice

A three-dimensional array has a total of six faces, or views. A
four-dimensional array has twelve views. An n-dimensional array
has n(n-1) views. The ability to rotate the data cube is the main
technology for multi-dimensional reporting and is sometimes
called “slice and dice.”
Region
Product
Actual/Forecast
29
Practical Limitations on Database Size

In general, as the number of dimensions increases, the number of
cells in the database increases exponentially.
for ex., a two-dimensional database with 100 products and 100
regions would have 10,000 cells. If we add a third dimension for
time with 52 weeks, we now have 520,000 cells.
Most commercial OLAP servers hit the cell limit long before they
run out of dimensions.
30
Topic 4: A list of some important features
supported by some OLAP Servers





Special time-series data types
Special dimensions for variables (complex mathematical
relationships, such as computed averages, and simultaneous
equations)
Multiple hierarchies within a dimension
Classes with a dimension
Virtual variables (computed on the fly at run time, such as “gross
margin derived from “revenues” and “expenses.”
31
Examples of OLAP applications in various
functional areas
32
OLAP Applications

Although OLAP applications are found in widely
divergent functional areas, all have following key
features:
– multi-dimensional views of data
– support for complex calculations
– time intelligence.
33
OLAP Applications - multi-dimensional
views of data

Core requirement of building a ‘realistic’ business
model.

Provides basis for analytical processing through
flexible access to corporate data.

The underlying database design that provides the
multi-dimensional view of data should treat all
dimensions equally.
34
OLAP Applications - support for complex
calculations

Must provide a range of powerful computational
methods such as that required by sales forecasting,
which uses trend algorithms such as moving averages
and percentage growth.

Mechanisms for implementing computational methods
should be clear and non-procedural.
35
OLAP Applications – time intelligence

Key feature of almost any analytical application as
performance is almost always judged over time.

Time hierarchy is not always used in same manner as
other hierarchies.

Concepts such as year-to-date and period-over-period
comparisons should be easily defined.
36
Representing Multi-Dimensional Data

Example of two-dimensional query.
– What is the total revenue generated by property
sales in each city, in each quarter of 1997?’

Choice of representation is based on types of queries
end-user may ask.

Compare representation - three-field relational table
versus two-dimensional matrix.
37
Multi-dimensional Data as Three-field table
versus Two-dimensional Matrix
38
Representing Multi-Dimensional Data

Example of three-dimensional query.
– ‘What is the total revenue generated by property
sales for each type of property (Flat or House) in
each city, in each quarter of 1997?’

Compare representation - four-field relational table
versus three-dimensional cube.
39
Multi-dimensional Data as Four-field Table
versus Three-dimensional Cube
40
Representing Multi-Dimensional Data

Cube represents data as cells in an array.

Relational table only represents multi-dimensional
data in two dimensions.
41
Multi-Dimensional OLAP Servers

Use multi-dimensional structures to store data and
relationships between data.

Multi-dimensional structures are best visualized as
cubes of data, and cubes within cubes of data. Each
side of cube is a dimension.

A cube can be expanded to include other dimensions.
42
Multi-Dimensional OLAP Servers

A cube supports matrix arithmetic.

Multi-dimensional query response time depends on
how many cells have to be added ‘on the fly’.

As number of dimensions increases, number of the
cube’s cells increases exponentially.
43
Multi-Dimensional OLAP Servers

However, majority of multi-dimensional queries use
summarized, high-level data.

Solution is to pre-aggregate (consolidate) all logical
subtotals and totals along all dimensions.

Pre-aggregation is valuable, as typical dimensions are
hierarchical in nature.
– (e.g. Time dimension hierarchy - years, quarters,
months, weeks, and days)
44
Multi-Dimensional OLAP Servers

Predefined hierarchy allows logical pre-aggregation
and, conversely, allows for a logical ‘drill-down’.

Supports common analytical operations
– Consolidation
– Drill-down
– Slicing and dicing.
45
Multi-Dimensional OLAP Servers

Consolidation - aggregation of data such as simple
‘roll-ups’ or complex expressions involving interrelated data.

Drill-Down - is reverse of consolidation and involves
displaying the detailed data that comprises the
consolidated data.

Slicing and Dicing - (also called pivoting) refers to the
ability to look at the data from different viewpoints.
46
Multi-Dimensional OLAP servers

Can store data in a compressed form by dynamically
selecting physical storage organizations and
compression techniques that maximize space
utilization.

Dense data (ie., data that exists for high percentage of
cells) can be stored separately from sparse data (ie.,
significant percentage of cells are empty).
47
Multi-Dimensional OLAP Servers

Ability to omit empty or repetitive cells can greatly
reduce the size of the cube and the amount of
processing.

Allows analysis of exceptionally large amounts of data.
48
Multi-Dimensional OLAP Servers

In summary, pre-aggregation, dimensional hierarchy,
and sparse data management can significantly reduce
the size of the cube and the need to calculate values
‘on-the-fly’.

Removes need for multi-table joins and provides quick
and direct access to arrays of data, thus significantly
speeding up execution of multi-dimensional queries.
49
Codd’s Rules for OLAP Systems

In 1993, E.F. Codd formulated twelve rules as the basis
for selecting OLAP tools.
– Multi-dimensional conceptual view
– Transparency
– Accessibility
– Consistent reporting performance
– Client-server architecture
– Generic dimensionality
50
Codd’s rules for OLAP Systems
–
–
–
–
–
–
Dynamic sparse matrix handling
Multi-user support
Unrestricted cross-dimensional operations
Intuitive data manipulation
Flexible reporting
Unlimited dimensions and aggregation levels.
51
Codd’s Rules for OLAP Systems

There are proposals to re-defined or extended the
rules. For example, to also include:
–
–
–
–
Comprehensive database management tools
Ability to drill down to detail (source record) level
Incremental database refresh
SQL interface to the existing enterprise environment
52
Categories of OLAP Tools

OLAP tools are categorized according to the
architecture of the underlying database.

Three main categories of OLAP tools include
– Multi-dimensional OLAP (MOLAP or MD-OLAP)
– Relational OLAP (ROLAP), also called multirelational OLAP
– Managed query environment (MQE)
53
Topic 5: Multi-Dimensional OLAP
(MOLAP)

Use specialized data structures and multi-dimensional
Database Management Systems (MDDBMSs) to
organize, navigate, and analyze data.

Data is typically aggregated and stored according to
predicted usage to enhance query performance.
54
Multi-Dimensional OLAP (MOLAP)

Use array technology and efficient storage techniques
that minimize the disk space requirements through
sparse data management.

Provides excellent performance when data is used as
designed, and the focus is on data for a specific
decision-support application.
55
Multi-Dimensional OLAP (MOLAP)

Traditionally, require a tight coupling with the
application layer and presentation layer.

Recent trends segregate the OLAP from the data
structures through the use of published application
programming interfaces (APIs).
56
Typical Architecture for MOLAP Tools
57
MOLAP Tools - Development Issues

Underlying data structures are limited in their ability
to support multiple subject areas and to provide access
to detailed data.

Navigation and analysis of data is limited because the
data is designed according to previously determined
requirements.
58
MOLAP Tools - Development Issues

MOLAP products require a different set of skills and
tools to build and maintain the database, thus
increasing the cost and complexity of support.
59
Relational OLAP (ROLAP)

Fastest-growing style of OLAP technology.

Supports RDBMS products using a metadata layer avoids need to create a static multi-dimensional data
structure - facilitates the creation of multiple multidimensional views of the two-dimensional relation.
60
Relational OLAP (ROLAP)

To improve performance, some products use SQL
engines to support complexity of multi-dimensional
analysis, while others recommend, or require, the use
of highly denormalized database designs such as the
star schema.
61
Typical Architecture for ROLAP Tools
62
ROLAP Tools - Development Issues

Middleware to facilitate the development of multidimensional applications. (Software that converts the
two-dimensional relation into a multi-dimensional
structure).

Development of an option to create persistent, multidimensional structures with facilities to assist in the
administration of these structures.
63
Managed Query Environment (MQE)

Relatively new development.

Provide limited analysis capability, either directly
against RDBMS products, or by using an intermediate
MOLAP server.
64
Managed Query Environment (MQE)

Deliver selected data directly from DBMS or via a
MOLAP server to desktop (or local server) in form of a
datacube, where it is stored, analyzed, and maintained
locally.

Promoted as being relatively simple to install and
administer with reduced cost and maintenance.
65
Typical Architecture for MQE Tools
66
MQE Tools - Development Issues

Architecture results in significant data redundancy
and may cause problems for networks that support
many users.

Ability of each user to build a custom datacube may
cause a lack of data consistency among users.

Only a limited amount of data can be efficiently
maintained.
67
Topic 6: OLAP Extensions to SQL

SQL promoted as easy-to-learn, nonprocedural, freeformat, DBMS-independent, and international
standard.

However, major disadvantage has been inability to
represent many of the questions most commonly asked
by business analysts.

IBM and Oracle jointly proposed OLAP extensions to
SQL early in 1999, adopted as an amendment to SQL.
68
OLAP Extensions to SQL

Many database vendors including IBM, Oracle,
Informix, and Red Brick Systems have already
implemented portions of specifications in their DBMSs.

Red Brick Systems was first to implement many
essential OLAP functions (as Red Brick Intelligent
SQL (RISQL)), albeit in advance of the standard.
69
OLAP Extensions to SQL - RISQL

Designed for business analysts.

Set of extensions that augments SQL with a variety of
powerful operations appropriate to data analysis and
decision-support applications such as ranking, moving
averages, comparisons, market share, this year versus
last year.
70
Use of the RISQL CUME Function

Show the quarterly sales for branch office B003, along
with the monthly year-to-date figures.
SELECT quarter, quarterlySales, CUME(quarterlySales)
AS Year-to-Date
FROM BranchSales
WHERE branchNo = 'B003';
71
Use of the RISQL MOVINGAVG / MOVINGSUM
Function

Show the first six monthly sales for branch office B003
without the effect of seasonality.
SELECT month, monthlySales,
MOVINGAVG(monthlySales)
MonthMovingAvg,
MOVINGSUM(monthlySales)
MonthMovingSum
FROM BranchSales
WHERE branchNo = 'B003';
AS
3-
AS
3-
72
Topic 7: The Microsoft Data Warehousing
Framework

The Microsoft Data Warehousing Framework is an open
architecture that is easily integrated with existing systems.
The Microsoft SQL Server DTS tool is used to import,
export, and repair or transform data (where it is
necessary). The Framework contains a rich object-oriented
programming interface for customized data warehousing
implementations. There is also a user interface, the
Microsoft SQL Server Analysis Services Manager that can
be used to configure the data warehouse and to populate
or update the content in a cube. It can be used to schedule
tasks, monitor performance, and perform queries on the
data warehouse.
73
Microsoft Data Warehouse
Product

Data warehousing software has been included with
Microsoft® SQL Server™ since the release of
version 7.0 in 1998.
74
The Cube as a Model for the Data
Warehouse

The cube is an imperfect yet satisfactory name for a
data warehouse repository. How does a data
warehouse cube differ from a geometrical cube? There
are a few important differences. A data warehouse
cube is defined by any number of dimensions (it is not
limited to three, and sometimes a data-warehousing
cube may have fewer than three dimensions).
Dimensions describe a data-warehousing cube just as
width, height, and depth describe a geometrical cube.
Where it is appropriate, dimensions can be organized
into any number of levels.
75
The Cube as a Model for the Data
Warehouse

The relationship between two dimensions can be
modeled using a grid. Dimensions are like the labels
of along the axes of the grid. The cells are facts. Facts
correspond to the cross product of each dimension of
the cube. The data in the cell is a measure. Measures
are the whole reason for the cube. If the cube is about
the number of items sold, the measure is a count of the
number of items sold. To repeat the grid example, the
measure is the number that you would find in the grid
cell.
76
Dimensions and Levels

Levels are used to organize dimensions into smaller units where
necessary. Levels may also contain other levels, depending on
how they are configured in the cube. For example, consider that
there is a region dimension. Perhaps this grocery store operates
in three states and uses the state boundaries as territorial
boundaries. Let's say that the region dimension contains three
levels: California, Oregon, and Washington. If the business has
additional sub-regions such as Seattle, Olympia, Yakima, and
Spokane in the state of Washington, these levels can be added as
levels to the Washington region, even if such detail is not needed
for the California and Oregon region. Levels are just a
convenient way to organize facts for a dimension.
77
Facts and Measures

A fact is about the combination of the various dimensions.
Locating a fact is like using a coordinate system. Just like a
position in a mathematical cube such as the origin, which might
be represented by (x=0,y=0,z=0), a fact would be represented by
specific combination of dimensions such as
(Product=broccoli, Region=Seattle, Time=Wednesday) yielding a
specific fact about broccoli being sold in Seattle on Wednesday.
Depending on the way that the cube is being used, the fact may
show a measure of something like 580 units sold or perhaps a
different measure like $860.00 in sales. The meaning of the
measure depends on how the cube is defined.
78
Aggregations

The mathematical operations of count and sum are an essential
part of the reason data warehouses are useful. These are
aggregations. Once dimensions are organized and a cube is being
processed, the aggregations are calculated. Generally,
aggregations are calculated immediately after the cube is initially
populated or there is a change to the content of the cube.
79
Using a Data Warehouse to Make Decisions

Consider a grocery store. Let's say that a
promotion has been running for a few days and
the grocer needs to decide whether or not to run
the promotion again. A question that the grocer
might pose would be something like, "Has more
of the product been sold during the promotion
period compared to prior periods?"
80
Using a Data Warehouse to Make Decisions
A grocery store inventory system may record prices, products,
sales, and
promotions in a transactional database using normalized
structures.
Inventory systems are optimized for inserting and updating records
and
perhaps for simple procedural selections such as retrieving the
cost of an
item. It is much less likely that the system is organized in such a
way that it
would be just as efficient to produce a report that details on a dayby-day
and product-by-product basis the effectiveness of a sale. In fact,
there is
usually a contradiction between systems that are designed for
81
transactional
efficiency and those that are designed for efficient queries. This is
Using a Data Warehouse to Make Decisions

In this case, it is easy to use a data warehouse to
answer the grocer's question. The sum of the
fact records that measure the number of items
sold using the cube dimensions of products,
promotion, and time can produce the needed
results.
82
Using a Data Warehouse to Make Decisions

To contrast this technique, the information in the
other systems may not even be in the same
database. The inventory data source may not be
the same data source as the customer data
source or the employee data source. Even if the
systems are in the same database, it would still
be a chore to build a system of queries that will
combine and aggregate the results in a way that
will produce the correct answer. In fact, this
effort of combining data sources and
aggregating the results is just what the data
warehousing software does best.
83
Viewing a Slice and the Programming
Interface to A Cube

Although fashioning a data warehouse in multiple
dimensions may be a simple design choice, and
performing queries that produce results that span several
dimensions may not be a significant chore for the
processor, the very constitution of multi-dimensional
output can make it difficult to display. Charts, graphs, and
tables are almost always presented in two dimensions.
There are some good three-dimensional charting tools, but
beyond that, the chart becomes more of a puzzle than a
visual aid. A common technique of viewing multidimensional output is to view the output one twodimensional "slice" of a cube at a time. This is the way that
the Microsoft SQL Server Analysis Services Tool displays
output.
84
Using DSO (Decision Support Object)

Fortunately, output is not restricted to two
dimensions. Microsoft SQL Server Analysis
Services provides a programming interface to
multi-dimensional data warehouse output: the
DSO (Decision Support Object). DSO can be
used programmatically to access the various
dimensions.
85
Using MDX

MDX (multidimensional extensions) is a syntax
designed for querying multidimensional objects
and data. For such systems, it is more efficient
and intuitive to use than SQL, which was
designed for an entirely different set of objects.
The grammar of an MDX query has a similar feel
to the grammar of an SQL query. Observe the
MDX query below which would select the sales
figures for the broccoli sold in Seattle on
Wednesday that I used in the example above:
86
Using MDX

Microsoft SQL Server OLAP Services provides an
architecture for access to multidimensional data.
This data is summarized, organized, and stored
in multidimensional structures for rapid response
to user queries. Through OLE DB for OLAP, a
PivotTable Service® provides client access to
this multidimensional online analytical
processing
(OLAP) data. For expressing queries to this data,
OLE DB for OLAP employs a full-fledged, highly
functional expression syntax: multidimensional
expressions (MDX).
87
Using MDX

OLE DB for OLAP is a set of Component Object
Model (COM) interfaces designed to extend OLE
DB for efficient access to multidimensional data.
ADO has been extended with new objects,
collections, and methods that are designed to
take advantage of OLE DB for OLAP. These
extensions are collectively known as ADO MD
(multidimensional) and are designed to provide a
simple, high-level object model for accessing
tabular and OLAP data.
88
MDX expression




SELECT axis specification ON COLUMNS,
axis specification ON ROWS
FROM cube_name
WHERE slicer_specification
89
SELECT





[Measures].[Sales] ON COLUMNS
[Time].[Wednesday] ON ROWS
FROM MySalesCube
WHERE [Region].[Washington].[Seattle]
AND [Product].[Vegetable].[Broccholi]

The output of this query would be a column
labeled Sales, a row labeled Wednesday and a
single grid cell at the intersection with the sales
figure
$860.00.
90
SELECT

The SQL Server Analysis Server Manager has an
interface that accepts MDX queries.
Alternatively, MDX queries can be incorporated
into programs that employ the DSO.
91
The Microsoft Data Warehousing
Framework

There is more information about how to build a cube using
the Microsoft SQL Server Analysis Services Manager at
MSDN Online, as well as guidelines
to consider for the design and configuration of your data
warehouse. See the "How to" article at this link:
http://msdn.microsoft.com/library/psdk/sql/aghtintro_2vov.
htm.
92
Tugas kelompok (maksimal 5 orang)
 Carilah
Paper yang berisi penggunaan
datawarehouse pada bidang tertentu
 Presentasikan kepada dosen pengajar di luar
waktu mengajar datawarehouse
 Range presentasi tanggal 15 s.d 19 Oktober
 Selamat mengerjakan
93
Tugas kelas malam
 Carilah
paper berkaitan dengan penggunaan
Datawarehouse
 Presentasikan minggu depan
 Selamat mengerjakan
94